McGraw-Hill Paperback 


Programming 
Techniques 


Blaise W. Liffick, Editor 


Volume 4 


Programming 
Techniques 


Volume 4 Bits 


a _ wea 2 wees 


QA Liffick, Blaise W. 
76.6 Programming techniques 
P7518 Gnas. h 

1979 

wov 19 3 


PACIFIC BELL 

Corporate Information Center 
2600 Camino Ramon, Rm. 1CS95 
San Ramon, CA 94583 


CORPORATE INFORMATION CENTER 


wae: 18@5166 


The borrower is responsible for return of this 
item to the CIC. Non-return for any reason 
h (including loss in Company or U.S. mail) will 
result in chargeback to the borrower's ARC 


i 
to cover replacement SatBsis. 


The authors of the programs provided with this book have carefully 
reviewed them to ensure their performance in accordance with the specifica- 
tions described in the book. Neither the authors nor BYTE Publications Inc, 
however, make any warranties whatever concerning the programs, and 
assume no responsibility or liability of any kind for errors in the programs or 
for the consequences of any such errors. The programs are the sole property 
of the authors and have been registered with the United States Copyright Of- 
fice. 


Copyright © 1979 BYTE Publications Inc. All Rights Reserved. Portions of 
this book were previously Copyright © 1977, 1978 or 1979 by BYTE Publica- 
tions Inc. BYTE and PAPERBYTE are Trademark of BYTE Publications Inc. No 
part of this book may be translated or reproduced in any form without the 
prior written consent of BYTE Publications Inc. 


Programming Techniques. 


CONTENTS: v. 1. Program Design.—v. 2. Simulation.—v. 3. Numbers in 
Theory and Practice.—v. 4. Bits and Pieces. 

1. Electronic Digital Computers— Programming. 2. Computer Simulation. 
3. Mathematics—Data Processing. |. Liffick, Blaise W. 
QA76.6.P7518 001.642 78-8649 
ISBN 0-07-037828-2(V. 4) 
Bits & Pieces 


TABLE OF CONTENTS 


From the Editor.) seria: ssseisc seats piaewie Fee dateas P nmeidd tone eae w Saate a 


SOFTWARE SYSTEMS 
About This Section 
A Real-Time Executive for Your Microcomputer 
Wayne Crintchley® 2a ccoucerg grata tose etee oahioas > & Siguale dvelasesaa niece an sasseed 4 eyes ie 


Multiprogramming Simplified 
Irwin Lahasky (BYTE magazine December 1977) ........-..60 0.000 eeeeeeeeee 


Introduction to Multiprogramming 
Mark Dahmke (BYTE magazine September 1979) .........--. 000s cere e eee e eee 


An Introduction to Multiprocessing 
War DARKS «inne é camcee ews aw Coe 8 Rea Vee ad Kes sere eees adamant 


Microcomputer Time-Sharing 
Kenneth J Johnson (BYTE magazine April 1979) .......... 0. cece cece e ee eee eee 


Time-sharing: Squeezing the Most from Your Micro 
Sheldon Linker (BYTE magazine June 1979) ......... 0.2 e cece eee eee eee eee 


Designing a Command Language 
GA Van de. Bout (BYTE magazine June 1979)... 00... . cc ccc e eee eee n eee nnee 


Linking and Loading 
Hatty: T6nAANE 05. 5 essere ones sh aietas s Ramee F Ge SF sre BRAG GORA © TG 188 8.8 
DATA HANDLING 


About This Section 


Sorting It Out 
Bitar DWAR IY nis ccna crvcmd caondd eenedandias ices Haas EERE EGER Ph eewas 


Computer Information Arrangement 
David Hollady (BYTE magazine October 1977) ......... 0. cece eee e nee 


Computer Information Arrangement (Update #1) 
BBULNR OG Paiesccco- sassaye a wiouensdjeen cen ch's maid Pertedy oazeanspsbdinede, MASA AGI DERE D Hakone oe 


Table Manners: An Introduction and Guide to Table Handling and Techniques 
Timothy UGauslin oc.24 ades ev cveg a cows 5 ess Peews 4 cm es taser Maaaied 6 Hien ao 


Variables Whose Values Are Strings 
W D Maurer (BYTE magazine October 1979) ........-. 6. ccc e cece nee eee eee 


Subroutine Parameters 
W D Maurer (BYTE magazine July 1979) .........- 6. cece cece ees 


t Phe ‘ ; 
Ret 
t eSNG ‘ 7 
oy 
: ¢ eM 
ay a AP PRAHA) Kila 
a ea eR 
i Rae Hay fii 
may 
4 ‘ 
¥§ 2 Ly ath 
i; ‘ bests li fitate 
ean! ‘ 
” \i 5 rl 
Wy c 
$id Ry? Oe 
v \ So ra LAT 
i : erst I 


ude 


Easy-to-Use Hashing Function 
Don Kinzer (BYTE magazine October 1979) ................0 00 cc eee e eee eee 131 


Text Compression 


James L Peterson (BYTE magazine December 1979) .................020 ccc euee 135 


ADVANCED TECHNIQUES 
About This Section 


The Algebra for the Boolean Exclusive — OR, With an Application to Hamming Codes 


PRIS SUTIHOONIS 6G 55 x Sivos. a < RAR UIE SUE © K-np0ce m Kino doth Kenna MER MU By 147 
Stacks in Microprocessors 

T Radhakrishnan, M V Bhat (BYTE magazine June 1979) ...................005 151 
Stack It Up 

Charlton H Allen (BYTE magazine November 1979) .............0000cc eee e eens 157 


What Is an Interrupt? 
R Travis Atkins (BYTE magazine March 1979) . 0.0.2... cece eee eee 163 


A Little Bit on Interrupts 
Robert R Weir (BYTE magazine December 1977) ............... 000 cece ees 169 


Optimization: A Case Study 
William B Noyce (BYTE magazine April 1978) .............. 00.000 e eee eee 177 


Low-Level Program Optimization 
James Lewis (BYTE magazine October 1979) .......... 0... cece cece eee ee eee 181 


Queuing Theory, The Science of Wait Control, Part 1 
Len Gorney (BYTE magazine April 1979) .......... 00. cc cece eee eee eee 185 


Queuing Theory, The Science of Wait Control, Part 2 
Len Gorney (BYTE magazine May 1979) ......... 0... c cc cece eee tee eee 191 


An Introduction to BNF 
W D Maurer (BYTE magazine January 1979) ..... 0... eee eee 197 


Aids for Hand-Assembling Programs 
Erich A Pfieffer (BYTE magazine May 1979) .......... 0... ccc cee cece 203 


An Introduction to Polish Postfix Notation 
WET COORE ee Ec laa Ceuralvawcattlncaites om 5 6 sian wine ata e sihwle AMD Ie PR 209 


Microprocessor-Memory Testing 
BQPPY, DBE ons ateir salty antetal anna eg nthe teach shutte ctl cial erats, sean ua aabuanbedeud sete tla Rolgwe sta ieNt ny 215 


An Algorithm for Drawing Lines 
Louis J Cesa, Robert B Hitchcock, Eduardo Kellerman ..........00eeeeeeeeeeecs 219 


FROM THE EDITOR 


Programming Techniques is a series of books specifically designed to help make 
programming easier and more enjoyable for the personal-computer enthusiast. This is 
done by providing articles which detail successful techniques for designing and im- 
plementing programs. Each book is a collection of the best articles on the selected 
subject from past issues of BYTE, The Small Systems Journal, plus new material which 
has not appeared in print before. This provides the reader with vital information from 
previous BYTE issues which might have been missed, new material that has not ap- 
peared in the magazine, plus a book covering one specific theme for quick, easy 
reference. 

The first volume in this series, Program Design (ISBN 0-931718-1 2-0), provides a look 
at several different methods for designing programs more efficiently and effectively. 
Included in the topics covered are structured-program design, modular-programming 
techniques, program-logic design, designing tables, and binary-tree processing. 

Volume 2 is Simulation (ISBN 0-931718-13-9). Its purpose is to familiarize the reader 
with both a general overview of the vast area of computer simulation as well as details 
of specific types of simulations. The term simulation can cover a lot of territory, but 
for this book only three categories were chosen: artificial intelligence, motion, and ex- 
perimentation. 

Numbers in Theory and Practice (ISBN 0-07-037827-4) is the third volume of the 
series. It covers many areas of numbers and computational methods for microcom- 
puters. It serves as an introduction to number systems, floating-point numbers, 
numerical methods, random number generators, and the mathematics of computer 
graphics. There are many practical programs included, as well as numerous references 
for further study of the subject. 

Volume 4 is Bits and Pieces, not an altogether whimsical title. This is perhaps the 
most advanced book of the series thus far, delving into some of the more difficult 
topics of programming. In many instances it gets down to the most fundamental levels 
of programming microcomputers, covering topics only slightly removed from the ac- 
tual hardware of the machine, such as stacks and interrupts. At other times, the discus- 
sions cover the other end of the spectrum, detailing the use of executives, or 
sophisticated operating systems on microcomputers. 

The three sections of this volume, Software Systems, Data Handling, and Advanced 
Techniques, bring together up-to-date information on some of the more complex prob- 
lems and applications facing the microcomputer programmer. Since the subjects 
covered are diverse, they will all help you become a more effective microcomputer 
programmer. 

Blaise W. Liffick 
Editor 


Software Systems 


About This 
Section 


Anyone who has been involved in 
computing for long is at least somewhat 
familiar with the history of computers. It 
has been an interesting history, 
although, for the most part, it spans less 
than 50 years. During that time, four 
basic generations of computers can be 
identified, with the microcomputers be- 
ing the most recent development. 

Many people have pointed out how 
similar the development of the micro- 
computer has been to the previous 
development of the minicomputer. 
Recall that the minicomputer came 
about as a result of the large-scale 
machines being too fast for the relative- 
ly slow peripheral input/output devices 
such as printers. The minicomputers 
were designed to interface between the 
large-scale machines and the 
peripherals, keeping the large machines 
from wasting time. It took some time 
before these “peripheral data pro- 
cessors” were recognized for their own 
potential as general-purpose computers. 

Similarly, the microprocessor was 
developed as a process controller. It was 


several years before anyone actually put 
one to use to drive what we now call a 
microcomputer. Again paralleling the 
minicomputer, the microcomputer 
began as a single-user machine, running 
one program at a time, essentially 
without an operating system. The 
microcomputer has been considered too 
slow to handle sophisticated system 
software. This criticism, along with it not 
having enough main memory, was also 
previously leveled at the minicomputer, 
but anyone familiar with the industry 
knows that the minicomputer is now a 
well-respected machine. 

As this section clearly shows, many 
people in the industry have apparently 
misjudged the capabilities of the 
microcomputer as well. With the speed 
of microprocessors increasing, and the 
size of memory on the rise, perhaps 
there is little the microcomputer will not 
ultimately be able to do. The articles in- 
cluded here make it obvious that the 
areas of multiprogramming, multi- 
processing, and time-sharing are not the 
exclusive realm of larger machines. 


A Real-Time Executive for Your 


Microcomputer 


Wayne Crutchley 


Many personal computer owners today 
are seeking new and better uses for their 
microprocessors. Often starting with a 
basic system capable of displaying and 
programming memory, playing various 
keyboard games, or running some ver- 
sion of BASIC, the serious computer en- 
thusiast begins to extend this package in- 
to a specially customized system. This 
specialized system may include such 
features as expanded memory, disk- or 
tape-operating systems, special in- 
put/output (I/O) peripherals for analog- 
to-digital conversion, music, speech, 
graphics, or any amount of various com- 
puter applications. All of these additions 
may amount to a seemingly endless job, 
involving continuous work towards com- 
pletion of the system. 

Where is the end to this continuous 
design and upgrading? When will the 
system be complete so that it need not be 
revised, but simply used? For many 
designers, there is no end; moreover, 
total satisfaction and completion will 
never be achieved on any system. 
However, for many others, the ultimate 
system design may be envisioned early in 
their personal computer adventure and, 
through well-planned and dedicated 
work, may eventually be achieved on a 
permanent basis. 

This presentation discusses one such 
variation of a completed, custom- 


designed microprocessor system, which 
has as its heart a real-time priority inter- 
rupt executive. Real-time processing is 
used today in a number of applications, 
large and small, with many different com- 
binations of hardware and software 
systems. This system, like all the others, 
contains the basic components that are 
present in all real-time operating systems, 
and this article will illustrate these basic 
components and principles through the 
description of this version of the ex- 
ecutive. 

There are three basic aspects of the 
real-time executive: 


@ real-time program execution 
@ priority scheduling and reentrancy 
@ interrupt processing 


Combining these three components into 
an integrated system results in an 
executive operating system with great 
capabilities. For commercial use, such an 
Operating system excels in such 
applications as process control or data 
acquisition (the main applications of the 
system presented here). For the personal 
computer owner, it can provide both 
real-time or process-control capabilities, 
as well as batch or multiple-user terminal 
processing. 

For example, envision the following 
home computer system equipped with a 


real-time executive; this system could be 
simultaneously performing the following 
tasks for the personal computer user: 


® controlling home or office utilities ona 
real-time basis (heat, electricity, ap- 
pliance control, security, etc) 

®@ monitoring status of these utilities with 
video display or hard-copy printout 

@ supporting multiple-terminal or 
keyboard inputs from various users 

@ supporting high-speed peripherals like 
disks or digital cassettes, along with 
Operating systems to run major pro- 
grams such as compilers, assemblers, 
or word processing programs 

@ supporting interfaces to specialized or 
experimental peripherals such as 
musical, vocal, or test equipment 

@ recording periodically acquired data 
and system date and time 

@ supporting system debugging or on- 
line machine-language programming 
utilities 

@ maintaining data-link communication 
to a similar microprocessor system or 
to a higher-level system at a remote 
location 


Is it feasible to expect this kind of per- 
formance from a microcomputer? The 
answer to this is an unconditional yes. A 
large fraction of the total processing 
capabilities of many personal and com- 
mercial microcomputer systems is often 
wasted, such that the total power of a 
microprocessor is sometimes under- 
estimated. For example, consider the 
question: What task consumes most of 
the total real time of a microprocessor 
system? If the subject is the typical per- 
sonal computer system, the answer is that 
most of the time (up to 99% or more) is 
spent looping or waiting for a keyboard 
input or some other event from the exter- 
nal world to key the next sequence of 
events. This implies that almost all of the 
computer’s processing power (and, 
similarly, the owner’s investment) may be 
wasted. The presentation of the following 
real-time executive will attempt to ex- 
plain how much of this wasted process- 
ing power may be harnessed to create a 
system with powerful capabilities that 
will enable a user to get the most out of 
his or her personal computer. 


Real-Time Executives 

A real-time executive is a program, or 
operating system, capable of handling a 
large number of tasks more or less 
simultaneously on a priority basis. These 
tasks may include program scheduling, 
interrupt servicing, peripheral com- 


munication, and others. To dispel a myth 
which some personal computer owners 
may have concerning real-time process- 
ing, the real-time executive does much 
more than merely keep time, as the term 
real-time may suggest. Timekeeping is 
merely one of the tasks (normally a 
relatively insignificant one, in com- 
parison to the total system task) that the 
executive may perform. 

The real-time priority executive gets its 
name from the manner in which program 
execution takes place within the system. 
The heart of the system is a clock-driven 
interrupt to the microprocessor that 
causes the system to execute certain pro- 
grams periodically and interrupt less im- 
portant programs according to a well- 
defined set of priorities. In almost all 
cases, the clock interrupt will be a 
precise, crystal-controlled or 60 Hz time 
base such that various program execu- 
tions will be performed accurately 
enough to do time-based functions like 
keep time and date. (This, however, is 
not an absolute requirement, merely a 
convenient feature.) In addition to the 
clock interrupt, the system receives 
various other interrupts that are generally 
associated with certain peripheral 
devices which request attention. 

Since multiple events will be con- 
tinuously occurring, both from external 
devices and from internal programs, the 
executive must be able to handle and ser- 
vice the events in the proper sequence. 
The idea that the processor is performing 
tasks simultaneously is, of course, not 
true, but it does appear to be that way, as 
it is humanly impossible to perceive the 
speed at which the computer performs 
tasks on a multilevel system. 


Priority 

The executive must handle a large 
number of tasks, since the total amount 
of work to be completed varies and since 
tasks are requested asynchronously from 
the external world. To handle these tasks, 
the executive must have a way of 
scheduling the sequence in which it per- 
forms certain programs when there are 
more than one to perform at any given 
time. The sequence in which the ex- 
ecutive will execute programs will de- 
pend on the priority of each program, 
which is set by the computer operator at 
the time the program is linked to the ex- 
ecutive. 


The necessity of priority is due to the 
importance of a particular task relative to 
any other tasks on the system. For exam- 
ple, assume a program such as a BASIC 
interpreter is linked to a real-time ex- 


ecutive, and the user has just initiated a 
BASIC program which involves a long 
calculation that may take several 
seconds, or even minutes, of the pro- 
cessor’s time to complete. There will be 
plenty of processor time in the next few 
seconds or minutes to perform the BASIC 
calculation; however, before the calcula- 
tion is completed, it is likely that some 
other task will be scheduled to execute 
on the system. For example, someone 
may request program service from 
another keyboard, a message may be 
received from a data link, or a regularly 
scheduled program, such as security 
monitoring, may be enabled for im- 
mediate execution. Naturally, the more 
important programs should not have to 
wait on a less critical function, such as a 
BASIC calculation, which is normally ex- 
pected to take some time anyway, and 
therefore can wait. The BASIC calcula- 
tion is then said to have a lower priority 
than, for example, the security program. 

Another example on a more critical 
time basis would be the servicing of a 
high-speed peripheral, such as a disk or 
tape interface. When the ready-interrupt 


signal from such a device occurs, it is 
critical that it be serviced immediately to 
avoid losing the data. Therefore, this type 
of program is even more important than 
the regularly scheduled type of program 


Figure 1: Visualizing the real-time executive. In figures 1 thru 3, the 
water reservoirs represent programs assigned to a given priority level, 
and the valves regulating flow out of the reservoirs represent the ex- 
ecutive priority scanner. Here, levels 1, 2, and 3 all have work to be 
completed, and the priority scanner is allowing level 1 (the highest 


and must have a still higher priority. 

As we can see, there exists a need for 
the executive to treat programs on a 
priority basis. There also exists the need 
for different degrees of importance. How 
are these different degrees of importance 
measured? What are the ground rules for 
the executive in handling these different 
degrees of priority? For the executive 
presented here, the degree of priority is 
measured by the level of the program, 
and the ground rules for the handling of 
the programs at the different priority 
levels are shown figures 1, 2, and 3. 

These simple illustrations represent the 
real-time executive as a group of water 
reservoirs, labeled level O thru level 3. 
(Level 0 has the highest priority and level 
3, the lowest.) These reservoirs represent 
the various priority levels of the execu- 
tive, which contains the current tasks 
to be performed. In the computer, this 
feservoir network is actually a list or stack 
of programs in the processor's main 
Memory; hence the name job stack. 

The water contained in the reservoirs is 
analogous to the current requested 
workload of the executive. The valves 
that control the water into and drain the 
water out of these reservoirs are 
analogous to the interrupt service 


priority not empty) to drain (or execute). 


routines and the priority scheduler, 
respectively. 

Figures 1 thru 3 show a series of 
sequential events, which illustrate the 
rules of priority within the real-time 
executive. In figure 1, a current workload 
exists at levels 1, 2, and 3. Level 0 is 
shown completed or empty at this time. 
Of all the levels that have a workload, 
level 1 has the highest priority. Therefore 
it is being serviced, or ‘‘drained,’’ while 
the programs at the other levels are 
awaiting execution. 

After a time period proportional to the 
total amount of workload originally con- 
tained in level 1, the level will be com- 
pleted. At this time, the executive will 
compare the priorities of all the levels re- 
quested. Level 2 now has the highest 
priority, and it will be the next to be ser- 
viced, as shown in figure 2. While level 2 
is being serviced, an interrupt occurs, 
resulting in the interrupt service routine 
of some particular device to request a 
program to execute at level 0. This status 
is shown in figure 3. 

At this point, program execution is tem- 
porarily suspended on level 2 in order to 
service the higher-priority level 0. Note 


Figure 2: Visualizing the real-time executive. Here, level 1 has finished 
its execution and level 2, the next in line, is now being serviced. The 
priority principle always forces the executive to service the highest of 
requests at any particular moment. 


that a special flag is set at level 2, in- 
dicating that it was interrupted. The pur- 
pose of this flag will be explained subse- 
quently. 

Eventually level 0 will be completed 
(assuming no further tasks are immediate- 
ly added at this level), and the executive 
will again be free to compare priorities. 
This time, level 2 still has the highest 
priority; however, the suspended flag 
exists at this level. This flag indicates that 
the current requested workload at level 2 
has already been started but interrupted 
before completion. In this special case, 
the executive will reenter the program at 
the point of interruption, after restoring 
all registers and status for that program. 
(Each level, of course, has a separate 
space allocated for storage of its working 
registers, status, and program counter; 
the space is unique to that level only.) 

When any interrupt occurs on the 
system, all interrupt service routines that 
will eventually lead to a task request must 
always save the processor status for a cur- 
rent level and set the appropriate flags. 
This is so the eventual return to the inter- 
rupted level (and program) will be made 
exactly where the program left off, with 


the exact processor status at the time c 
the interrupt. (This basic concept i 
known as reentrancy, which will be fur 
ther explained later.) 


The simple example shown in figures ° 
thru 3 illustrates only one of the many 
possible sequences and situations whict 
may occur during real-time executive 
scheduling. It should be noted that the 
interrupt service sequence, describec 
above, can also occur in a multiple-leve 
or chain-reaction manner. For example, 
assume that while level 2 is in the 
suspended state and level 0 is being ser- 
viced, as above, another interrupt occurs 
on the system, resulting in tasks being re- 
quested on the system at level 1. For this 
case, during the interrupt service routine, 
both level 2 and level 0 have job re- 
quests, and both are in the suspended 
state. (All processor status data for each 
level is saved in areas unique to each 
level.) Upon completion of the particular 
interrupt service routine, priorities of all 
requested tasks will be compared (in- 
cluding both interrupted and new tasks). 
The task with the highest priority will 
always be entered (or reentered) first. In 
this case level 0 would be reentered. 

To any active program, any interrupt 
on the system will be essentially unno- 
ticed, since the program will always be 
entered (or reentered) with all processor- 
status data intact. Hence, as far as in- 
dividual programs are concerned, they 


all run simultaneously, each program us- 
ing processor time on a priority basis, 
whenever it is available, to complete 
their individual tasks. The operation of 
the priority principle may be summarized 
as follows: a program at any particular 
priority level will be allowed to execute 
only when all programs at a higher level 
are complete. In figures 1 thru 3 we see 
the processing of jobs by a priority inter- 
rupt system as the control of water into 
and out of reservoirs, each of which 
represents a priority level for the jobs 
(water) contained within. Normally the 
processor is sitting with all ‘‘reservoirs’’ 
empty, waiting for a task to be requested. 
When one or more tasks arrive, they will 
be finished immediately, and the pro- 
cessor will be ready for more tasks. The 
total amount of time that the processor 
spends with tasks to be completed at any 
level, as compared to the total available 
time, is referred to here as the job 
loading and will be discussed later. 


For the home microprocessor system 
described earlier, this job loading should 
average Out to approximately 50% or 
less. However, this does not mean that at 


ee ee ee ae ee 


all times there will be 50% processing 
capability unused. For example, during 
the long BASIC calculation described 
earlier, which may take several seconds 
or even minutes, the job loading will be 
100% until the calculation is complete. 
Assuming that before the calculation was 
initiated, the processor had a 55% job 
loading, then during the calculation, this 
55% of the processor time will still be 
devoted to its normal tasks, and 45% will 
be put towards the calculation. The 
greater the normal job loading is, the 
longer the calculation will appear to take 
(although the actual amount of processor 
time will always be the same). 

During the calculation, any program at 
a lower priority than the BASIC routines 
will not be allowed to run, or will be 
suspended until the calculation is com- 
plete. Thus, the priority system gives the 
user a method of distributing or 
budgeting the total processor time. These 
descriptions basically explain the priority 
principles used in the real-time ex- 
ecutive. Many questions may arise at this 
time, such as: How should it be deter- 
mined at what level to place any par- 
ticular program? What happens when 
more than one program on a level is re- 
quested at the same time? How do inter- 
fupt service routines request tasks to be 
run in the executive? How do | link my 
Own custom programs to the real-time 
executive? These and similar questions 
will be dealt with in later sections. 


Program Scheduling 


One of the ways in which programs 
become enabled, or requested to ex- 
ecute in the real-time system, is through 
the scheduler, which is an integral part of 
the real-time executive. The scheduler is 
linked to the real-time clock interrupt, 
and includes the clock-interrupt service 
routine as one of its program segments. 
On most practical real-time systems, it is 
desirable to have certain programs run 
On a regular, cyclic basis; for example, 
once per second, once per 100 ms, etc. 
For the personal computer user, these 
programs may perform the following 
functions: security monitoring, utilities 
Monitoring, video updating, timekeep- 
ing, etc. This type of program execution 
mode is referred to here as polling. 

The clock-interrupt service routine will 
have access to a set of tables, one that 
points to various programs to be ex- 
€cuted, and another that is essentially a 
list of programs grouped in order of their 
Priority of execution. For the executive 
Presented here, these tables are called 


Figure 3: Visualizing the real-time executive. Here, during the servicing 
of level 2 (in figure 2), an interrupt occurs, resulting in new tasks flowing 
into level 0. Therefore, level 2 is suspended, and a special “suspended” 
flag is set. Level 0 is now being serviced, and when it is finished, level 2 
will restart. Level 2 is then said to be “reentered.” 


the poll table and executive job stack, 
respectively. 

The executive job stack contains cer- 
tain information about each program and 
ties the program to a certain /evel and 
program number. The level determines 
the relative priority of a program. Also, 
only one program from a given level can 
be running, scheduled, or interrupted by 
the executive. 

The function of the poll table is to 
group programs (defined as _ level- 
number/program-number pairs) accord- 
ing to the frequency with which they will 
be executed. Each group within the poll 
table contains three items: an execution 
time that determines the frequency of ex- 
ecution of the programs in that group, an 
active timer initiated to the execution 
time (periodically decremented until it 
signals execution of the programs at a 
count of zero), and a list of pointers to the 
executive job stack of the programs to be 
run at this interval. 

At each clock interrupt, each active 
timer is decremented by 1, and, when 
any one expires, the associated programs 
linked to that poll group are enabled to 


execute. Before returning to the inter- 
rupted program, the interrupt routine will 
scan the executive job stack to check the 
status of programs. If any of the newly 
enabled programs are found to be at a 
level higher than the interrupted pro- 
gram, the interrupted program will be left 
suspended, and program control will be 
given to the newly scheduled task. Other- 
wise, the interrupted program will be 
resumed. 

Thus it can be seen how normally 
scheduled programs of a higher level can 
interrupt lower-level programs through 
the actions of the clock-interrupt routine, 
thus maintaining priority for all schedul- 
ed programs. 

As shown above, the scheduler is 
merely a sophisticated network of timers, 
which are counted down and monitored 
at precise intervals through the actions of 
the real-time clock interrupt. The expira- 
tion of any of these timers causes a pro- 
gram or group of programs to be enabled 
and eventually executed on a priority 
basis by the executive. 


MASK REGISTER ADDRESS 


Interrupt Processing 

Another way that programs are ena- 
bled to execute is through interrupts from 
peripheral devices. Many different varia- 
tions of interrupt routines are possible 
and are generally customized to the par- 
ticular device concerned. All interrupt 
service routines must, however, be link- 
ed to the executive in some organized 
fashion. For this particular executive, this 
is done by making various major utility 
subroutines available for use, such that 
custom interrupt routines are easily writ- 
ten. The implementation section deals 
with this subject in further detail. 

At this time, one point should be made 
concerning priority: the levels, as 
described in the previous sections, refer 
to the software levels of priority only. 
There exists another level of execution 
which has a higher level of priority than 
even the highest software level. This is 
the interrupt level. Any program running 
at the interrupt level will have absolute 
control of the processor and cannot be 
interrupted or suspended in any way. 


INTERRUPT 
REQUEST 
LATCHES 


IRPTO 


PRIORITY 
ENCODER 


DEVICE 
REGISTER 
IN 


INTERRUPT 
MASK 


IRPT1 REGISTER 


ENABLE 


PROCESSOR 
170 
INTERFACE 


"7 INTERRUPT ACKNOWLEDG: 


INTERRUPT REQUES’ 


DEVICE 
REGISTER 
ouT 


N-LINE 
DECODER 


STROBE 


INTERRUPT REQUEST RESET 


Figure 4: A block diagram of a hardware priority interrupt system. Priority detection is done by the encoder, which 
selects one of the N lines and encodes it into a binary representation of the selected line. Latched interrupt requests are 
reset when they are acknowledged by the processor through the device register-out port. A mask register is shown here 
to provide software control to disable any interrupts not used or not needed. 


10 


(Naturally all interrupts are disabled for 
this program mode.) 

For best operation on any real-time 
system, another priority chain should be 
implemented to handle interrupts. This 
priority chain is done with the hardware, 
which interfaces the processor with the 
external interrupting devices. A hardware 
block diagram of this priority chain is 
shown in figure 4. This diagram may be 
applicable to a number of different 
microprocessors. 

The heart of this system is a hardware 
priority encoder (such as the 74147 or 
74148 integrated circuit), which encodes 
1 of N lines from peripheral devices into a 
binary code that identifies the pending 
device. A priority circuit is used here for 
the case when more than one interrupt 
occurs before the processor has a chance 
to acknowledge any of them. In this in- 
stance, the one with the highest priority 
will be serviced first (analogous to the 
procedures for software priority-level ser- 
vice). Optionally, the priority detection 
may be accomplished by software at the 
interrupt level. This requires directly 
teading the latched requests and using an 
algorithm to determine priority; this 
scheme, however, is not dealt with here. 

All of the latched interrupt requests are 
gated through an interrupt mask register 
before being presented at the encoder in- 
put. This enables the processor to switch 
off unused interrupts, or those that are 
not needed at the moment. After 
acknowledging any interrupt, the pro- 
cessor must reset the latched request so 
that it will not be interrupted again by the 
same event. This is accomplished by the 
device register out port, used by the pro- 
cessor to cause a pulse that strobes the 
reset on a given individual request latch, 
thus freeing the latch to receive another 
interrupt once the current interrupt has 
been serviced. Normally, this request 
feset will be a hardware function. 

Since figure 4 is meant as a general 
block diagram, it may not be the best ap- 
plication, exactly as shown, for all micro- 
processors. However, it is the basis of a 
viable scheme that will fulfill the real-time 
executive’s needs. For variations of such 
designs, it is recommended that hard- 
ware priority detection be used over soft- 
ware methods, due to the speed 
advantages. Also, encoding (that is, N 
lines encoded to a binary number) 
Should be used to allow expansion to a 
large number of devices. Finally, a mask 
register is recommended for greater flex- 
ibility in dealing with peripheral inter- 
rupts. 

It should be made clear at this time that 


, 


the hardware priority levels are not 
necessarily related to software levels, but 
are merely a method of distinguishing a 
further and higher level of importance. 
For example, a device that has hardware 
priority level 0 (ie: highest priority) can be 
linked to an interrupt service routine 
which may eventually cause a program to 
be enabled at software level 3, say, or 
vice versa. In this case, the hardware 
device and the software program servic- 
ing the device have totally different 
priorities. The hardware priority system is 
merely a way in which a secondary 
priority system may be implemented at 
the hardware level. This allows better 
control of interrupt devices, since the ex- 
ecutive has little control over hardware 
interrupts, other than enabling and 
disabling them. The hardware priority in- 
terrupt scheme is very important when 
using high-speed peripherals on the 
system. However, it is perhaps not as im- 
portant as it is useful, when only slower 
devices are used. 


Real-Time Executive Example 

A working version of the real-time 
executive is presented in this section. The 
assembled source listings, along with 
other diagrams, are used to illustrate its 
structure and operation. This particular 
version is being used as the executive for 
a commercial process-control design that 
handles tasks in excess of those in the 
home system described earlier. Used and 
tested in a home system, it was found to 
be extremely easy to work with and to in- 
terface. This makes it ideal for the per- 
sonal computer system. This software’s 
initial test applications have shown that 
the real-time executive and the particular 
microprocessor used have powerful 
capabilities when they are linked to effi- 
cient system programming. 

The listings presented here show the 
final assembled and debugged programs. 
The listings that cover some of the table 
structures, however, have been ab- 
breviated to save space, but the initial 
portions of the tables illustrate the overall 
structure so it may be extended by the 
reader. 

This version is assembled at hex- 
adecimal address 1000 and occupies 2 
to % K bytes, depending on the length of 
the executive job stack, poll tables, and 
stack areas. The version currently in- 
cludes eight priority levels, with four pro- 
grams per level; however, it can easily be 
expanded to provide up to 16 levels, with 
as many as 16 programs per level, a 
capacity which, of course, will take up 
more memory space. The Z80 is used for 


11 


1688 108 5 


1668 6162 & 

1608 6104 * 

1668 6105 & 

1668 e110 + 

1608 OLLS AHH e CREE EN ENON 
1606 120 + 

1668 125 & 

1606 e123 6 

1660 @130 @ PRIORITY INTERRUPT REAL TIME EXECUTIVE 
1606 O14 @ SYSTEM POINTERS & EQUATES 

Lead @156 & 

1666 O160 # 

1686 @162 + 

1808 C164 HED 
1e8e G16 & 

1868 Gles # 

1808 e170 * 

1686 e172 + 

1808 00 G180 KALB CURRENT LEVEL (B7=ACTIVE FLAG,BO-3=LUL®) 
1681 196 

1661 66 200 XLUL OB LUL POINTER DURING XJS SCAN 
1862 216 + 

1602 0226 4 

1602 66 06 236 KPGN OMG XJS PGK ADDR DURING A LEVEL SCAN 
1604 6240 # 

1004 68 G256 xRSC OBO RESCAN FLAG, SET TO -1 WHEN XJS RESCAN 15 REO 
1665, 6260 4 

1685, 6276 XMSK «EDU 08 WORN IRPT MSK 
L6e5 6286 

1665 6296 XNSC EDU 08 MSK DURING RTC SRUC 
1005 6306 8 

1605 66 316 KNE Oo 6 EXT USER NSK BITS 
1606 0326 & 

1006 0330 0S 24 KEXC STACK 
161E O340 STK «-EQU 6 

101E 0350 # 

1GLE 68 66 6366 XTNP =D OO KEXC TMP SAVE 
1v26 0370 5 

1w20 es72 6 

1028 0374 8 

126 @376 + 

1620 C300 HORRE KERRIER ERE CERE REE KORE 
1626 O32 # 

1626 8384 4 EXECUTIVE JOB STACK 

1628 6336 4 

1626 8330 # FOR EACH FGM, FORMAT AS FOLLOWS 
1628 8408 @ BYTE G - B7 PGN ENABLE BIT 

1626 O46 BG PGN SCHEDULED BIT 

1628 6420 & BS-B SPARES 

1626 422 # 

1626 6436 # BYTE 1.2 = L.H PGM ADOR 

1628 0440 # 

1626 0445 SHROREERASRRREN MARES HA EER EE RHEE OEE 
1628 e450 @ 

1026 6460 XS. EDU START KIS 

1626 6476 6 EQu ¢ STRT LO 

1626 66 8486 OB @ 16.P6 

1021 66 68 0496 ow 6 

1823 6500 8 

1623 86 6516 OB 686 LO.PL 

1624 66 66 u e526 DM = XDBG DEBUG PGN 
1626 6538 

1626 66 e546 ce 8 L6.P2 

1627 66 66 6556 on 8 

1629 6566 & 

1629 668 0576 DB 8 L6.P3 

W2A 66 68 e566 

twee 0530 # 

Wee rr 6608 DB -t EHD Lé 

1620 Gees 

lweb 0620 t cou $ STRT Lt 

iweb we 0636 cB 6 LL Pe 

102E we 66 6646 oh 6 

1836 G50 & 

1836 06 GEG OB 6 ub Ph 

1631 68 66 678 oh 8 

1633 0680 & 

1633 66 6696 ob 6 Lt P2 

1634 68 68 706 oh 6 

1636 erie se 

1636 06 6726 cB 8 Li P3 

1837 66 Ce 0738 bu 8 

1639 0740 8 

1039 FF 0756 OB -t END LL 

163A o760 # 

1630 7762 cou $ STRT L2 

163A 66 6786 oe 6 L2.P6 

1636 68 06 6790 ow 8 

1830 0306 

1030 8 0816 oB @ L2.PL 

W3E 08 66 0828 oh 6 

lode 836 @ 

Lede 08 6840 oe 6 L2.Pe 

Lust ou 66 e35e oh 8 

1643 0866 

1843 66 0876 ob 6 Le.P3 


Listing 1: System pointers and executive job stack. The system pointers 
and “equates” tables show the global temporary storage area used by 
the executive. The executive job stack XJS shows the program format 
used for entries in the executive job stack. The levels are labeled as “0:” 
“1%, °2:", etc. These levels may be relocated to anywhere in memory. 


12 


this version of the real-time executive, as 
this processor has many advantages over 
most other 8-bit microprocessors for this 
particular type of system. Despite the ad- 
vantages of the Z80, many of the more 
powerful microprocessors available to- 
day could also handle the real-time ex- 
ecutive very well. The principles of real- 
time priority processing are as equally ap- 
plicable to these other processors. 

Generally, the following features of the 
Z80 have been the most advantageous 
for the real-time executive and associated 
software: 


@ generally powerful instruction set 

@ flexible input/output (I/O) capabilities 
(eg: register indirect, block I/O) 

@ excellent hardware interrupt 
capabilities (eg: Z80 mode-2 inter- 
rupt) 

@ high-speed operation 


In particular, the I/O capabilities of the 
Z80 have been shown to be the most 
useful for multilevel processing, where 
common I/O subroutines are shared by 
multiple levels on a reentrant basis. This 
allows easy, efficient means of writing 
and interfacing I/O software to multilevel 
programs. The hardware interrupt system 
allows easy implementation of a hard- 
ware priority-interrupt chain, giving 
ultimate flexibility in creating or 
relocating interrupt service routines. The 
restart vectors are then freed for other 
uses. 


Software-Table Structure 
System pointers (listing 1) 

The first half of listing 1 shows the 
global system pointers for the executive. 
These pointers are highly important. 
They are responsible for keeping track of 
the master executive status (ie: where it 
is) at all times. Among the more impor- 
tant of these pointers: 


XAL Current active executive 
level 

XMSK, XMSC Interrupt mask register 
enable bits 

XSTK Executive active stack 
area 


Executive job stack (XJS, listing 1) 

The second half of listing 1 shows the 
start of the executive job stack. All user 
programs linked to the executive must be 
listed in the job stack. This particular 
stack is listed in order of priority. 
However, it does not need to be; any 
level of the executive job stack can be 
added or relocated to anywhere in 


1666» 1809 
re 662 * 1ec3 
1033 1664 FANHOAOHNSERAAKA ARO OREONEES tecs 
1688 1665 # \ Lec 
1686 1668 # © LEVEL CONTROL TABLES 18C9 
1688 1663 # 1609 * 
1688 1ere & 8 BYTES PER THBLE 1ec> 2235 KNIRAAMRAHREA HRA NH EE OHHE 
1868 1656 # 1 TABLE PER LUL 18c3 2233 4 
1688 1636 # 1ec9 2240 xTOL QU § 
1088 1693 bhOdbAGHnOAAHAAN AAR AAHNEEE tecg 2250 # TEL FOR G.1 SEC MIN POLL PGMS 
1688 16C3 2260 #- 6.1 SEC: - 7 
1068 cou $ STRT LVL TBLS 16C3 61 2276 oe 4 @.1 sec 
tees ACR OL 2280 oe ACTIVE TIMER 
twee 2¢ 18 Dk 6 LO kus 18Cé vO 2290 OB 666 (LO, Fe 
108A 00 be 8 STATUS ¢B7,LUL REG B6,/LUL IRPT'D) 1eCe 16 2308 dB Oe LL PO 
1086 AG 12 Ok STO STK HOUR 18Cd 26 2316 OB 626 = L2,PO 
1wev Bo 08 oh 6 SP SAVE LGCE 36 2326 DB 383, FO 
1wsF 68 oe @ SPARE 1eCF 46 2336 OB ada L4,P6 
1098 1608 58 2348 0B ase LS, PO 
1098 20 16 oho LL xJs 1601 68 2356 DB ese L6,P6 
1692 08 a) stat 1ep2 76 2366 0B 678 U7, P6 
1693 C6 12 Ou = XSTL STK 1803 FF 2378 0B -1 — END GROUP 
1095 e8 08 oh SP 1804 2380 #- @ 5 SEC: - 
1897 68 0B 6 1604 6S 2338 0B 0.5 SEC 
1098 105 eL 2408 OB 1 TIMER 
1038 3A 1@ ou 2 L2 XuS 1806 68 2416 0B @ NOP 
194 60 0B 6 star Leo? LL 2426 OB Git LLPL 
1098 £6 12 OW xST2 STK 1008 21 2436 OB GZL  L2,PL 
1030 06 60 on 6 5P 1009 31 a4 OB G3L = «L3,PL 
109F 68 oe 6 180A FF 2456 OB -1 END GROUP 
tere Leos 2460 # 
1046 47 10 dh 3 L3 AUS 1608 68 2476 oe 8 ENO XTOL 
1onz 66 oe 6 STAT 180C 2480 ¢ 
18A3 20 14 OM XST3— STK Lec 2430 8 
LAS 60 00 oH 6 5P 1900 2506 x11 QU $ 
len? 60 0B 6 1600 2510 + TBL FOR 1.8 SEC MIN POLL PGMS 
teas 1e0c 2520 #- 1.0 SEC: - 
16AS 54 16 on 4 L4 xUS 1@0C 61 2530 oe 1.6 SEC 
1eAR GO dE 8 STAT 1800 61 2546 oe 1 TIMER 
16AB 20 14 DM xST4 STK 1e0E 62 2556 OB C62 (La.F2 
Lead 60 66 oh @ SP 1e0F 12 2566 OB Giz LLPe 
LOAF 68, oe 6 10E8 22 2878 OB G22 L2,P2 
tage LWEL 32 2588 0B 632 LS.P2 
1086 61 10 bus LS NS 102 42 2598 DB G42 L4.P2 
1ee2 GO 0B 8 STAT 1063 43 2600 0B G43 LPS 
1083 46 14 Dk xSTS stk 1wWEd 52 2610 DB 852 Ls, P2 
1685 66 66 mh ef LOES 53 2626 08 053 (5,P3 
1667 08 ve ¢ LOE 62 2638 DB G62 L6,P2 
108s 1GE? 63 2648 DB a63 Le,P3 
1088 GE 16 dw 6 L6 ANS 10E8 72 2650 0B 872 L7.P2 
166A oe dB Stat 1863 73 2666 be @73 U7, P3 
LOBE 68 14 Ok xSTE Sk LOEA FF 2676 0B -i END GROUF 
Lwev 66 68 be aF 18EB 2680 ¥- 66.0 SEC. - 
LOEF 66 ob 6 18E8 3C 2696 0B 68 68.8 SEC 
1ece 18EC G1 2708 oe 1 TINER 
1606 78 18 oh 7 U2 WS 1vED 23 2710 DB G23 L2.F3 
1oc2 68 0B OG Stat LEE 33 2726 0B 033 (L3.P3 
WL 30 14 Dk xST? Stk LWEF 41 2736 DE 641 isePL 
Lucs 68 68 M6 sF Lore 51 2748 DB 65L LS.PL 


Listing 2: The /evel-control and poll-time tables. The level-contro! table XLT is used to keep track of each level’s 


status individually. It also contains pointers to the stack area and to the job stack for that level. The poll-time tables 


XT01 and XT10 contain data that point to the programs to be scheduled at that particular poll interval. 


memory with a simple 2-byte change. For 
each program there is a 3-byte entry in 
the executive job stack. The first byte 
contains the program-enable and status 
bits (the program can be manually en- 
abled or disabled by the user by altering 
these bits). The next 2 bytes contain the 
address of the program. 


Level-control tables (XLT listing 2) 

These tables contain the status byte 
and the pointers associated with each in- 
dividual level. Through the use of static 
and dynamic pointers in this table, the 
executive is able to execute, suspend, 
and reenter the different levels while 
Maintaining complete status for the 
levels. The first 2 bytes point to the begin- 
ning of the executive job stack for that 
level (this address may be modified by 
the user to relocate or create a new 
level). The third byte is a status byte that 
contains level flags (level interrupted, 
level requested, etc). The fourth and fifth 
bytes point to an area of memory that will 


be used as the stack area for that level. 
The stack pointer is initialized to this 
address each time a new program is ex- 
ecuted on that level. (Each level naturally 
has its own stack area.) This address may 
also be modifed to allow for different 
stack requirments for different levels. 
When the level is interrupted or suspend- 
ed, the sixth and seventh bytes are the 
save address for the current stack 
pointer. All other status is saved directly 
on the stack. The last byte is currently 
unused and is included to make the table 
length exactly 8 bytes for simpler index- 
ing purposes. 


Poll-time tables (listing 2) 

These poll-time tables are divided into 
two major segments, those that are 
scanned 10 times per second, and those 
that are scanned once per second. Thus, 
the various poll groups within each seg- 
ment may have a minimum resolution (or 
minimum poll time) of 0.1 second and 
1.0 second, respectively. (This 


13 


10FS. 

16FS 

1eFS 

1eFS 

lors. 

16rd 

ler> 

18F> 

lord 

16F9 

16F> 

1F> 

1rd LY 13 13 
10F6 FL 

109 AF 

1OFA 32 5€ 11 
1@FD 21 C9 16 
1168 7E 

1181 A7 

1102 28 3D 
1164 23 

1165 35 

1186 28 69 
1168 3E FF 
110A 23 

1168 BE 

1L@C 26 FC 


i114 77 

1115 3E FF 
1117 32 64 16 
111A 

Lita 23 

AL18 7E 

ALIC FE FF 


ss 
8 
x 
8 


1126 E6 78 
1128 SF 

1129 16 68 
1128 19 

112 5E 

1120 23 

1L2E S56 

AL2F 23 

1136 CB FE 
1132 739 

1133 €6 OF 
1135 4F 

1136 17 

1137 81 

1138 6F 

1139 26 66 
1138 19 

L13C CB FE 
AISE EL 

L13F 18 09 
Lid 

1141 3A 58 LL 
1144 A? 

1145 26 13 
1147 21 59 iL 
114A 35 

L148 26 6D 
114) 3E 6A 
114F 77 

115@ 32 58 it 
1153 21 0C 16 


2828 5 


2822 € 
2624 seeseeteseeonseeseseeeccnes 


2826 § 
2828 8 


PROGRAM - XRTI 


2836 # DIRCT RTC IRPT HANDLER 
2840 @ 1) SCANS POLL TBLS 
2856 # 2) SCHEDULES PGN xQT 
2866 # 3) EXITS TO XSCH 


2870 & 


2872 SNORRRREREON OREO SEREOREOEEE 


y 
zx 
S 
oe 


+ CALL XUT@ 


SAVE ALL STAT 
EI 

KOR A 

LD ¢F+).A SET PASSE 

LD HL.XT@L .L SEC TBL 

LD ALCHL) 


SR 2.2: IF EXD TEL 


DEC ¢HL) COUNT DN 
JR 2.5: IF EXPIRED 


SR 
= AL ADU TO NXT GROUP 


DEC HL GET POLL 


LD A.CHLD 

TNC HL 

LD CHLD.A RESET 

LD Ar-L 

LD ¢XRSC).A SET RSCH FLG 

INC HL 

LD ALCHLD GET PCH. LU 

cp o-t 

JR 7.6: SP END GROUP 

PUSH HL ‘SAV POLL INDX 

LO HL, XLT 

LD CsA SAVE LLPP 

RRA A=<(LUL#SD=DISPL IN LUL TBL 
AND 678 EXTR LUL # BITS 

tb EA 

LD OD. 

ADD HL.DE FORM STRT ADDR IN XJS 
LD E.CHLD 

INC HL 

LO D.cHL 


Lo ac =PGn & 

AND GF BG-3,CA=8 

LD CA 

RLA 2 

abo C 3 

tb OLA 

tb He HL=DISPL FM LUL STRT 
ADD Hi,DE 

SET 6.<HL) SET PGM SCHL‘D 

POP HL =POLL INDK 
JR: 

LD AF.) 

AND A 

JR =ONZ,XSCH IF END PASS 2, THEN DONE 
LD Hi.E: TIMER 


DEC <HL> CHT ON 
NZ,XSCH NOT YET 
AL1e 


LD <HL>.A RESET 
LO CFA “Pass 1 
LD HL.XTL 
wR SCAM 1 SEC POLL TBL 


cB Oe PASS FLG 
vB OL CTU TIMER 
END RTI 


Listing 3: The clock-interrupt routine, XRTI. This routine counts down 
the timers in the poll tables and sets the program ready bits in the job 
stack for those programs of a poll group that are ready for execution. 


14 


breakdown is done to minimize 
overhead in the table-scanning routines.) 
Within each of the two segments, there 
may exist any number of poll groups, 
each of them having their own individual 
poll time, which is always a multiple of 
the minimum poll time. The first 2 bytes 
of each poll group are the actual poll 
time, followed by the active timer. Next 
comes a list of program pointers of 
variable length, that is terminated by a 
byte containing a-1. For each of the pro- 
gram pointers in the group, one program 
in the job stack will be scheduled at that 


group’s poll time. The program pointer 
bytes are divided into a high and low 
4-bit nybble. The high nybble identifies 
the level in the job stack, and the low 
nybble points to a program on that level. 
This arrangement allows flexibility in 
assigning varying amounts of poll time to 
any program on any level of the job 
stack. 


Interrupt-vector tables (XIVT, listing 7) 

This table is pointed at by the Z80 
mode-2 interrupt register. Any of the 
eight interrupts connected to this system 
will cause the processor to perform an 
automatic indirect call to one of the eight 
addresses listed in this table. The hard- 
ware priority encoder on this system is 
configured such that devices 0 thru 7 will 
cause a hexadecimal FO, F2, F4, ... FE to 
be placed on the processor data bus to 
be used as the low-order byte of an inter- 
rupt table address. The high-order byte is 
given by the value in the | register, mean- 
ing that this table can be easily moved to 
any page of memory. Currently existing 
in the table are vectors for the real-time 
clock interrupt (ie: device 4) and two 
keyboard interrupts (ie: devices 6 and 7). 
The vector for device 0 is reserved for the 
digital cassette or disk interrupt, and the 
rest are unused. It should be noted that 
this table can be easily expanded to ac- 
commodate up to 128 interrupt vectors 
(ie: 256 bytes). 


Program Structure 

The general program structure of the 
real-time executive is shown in figures 5 
and 6 in a Warnier-Orr structured 
diagram. The following paragraphs 
describe the four major program 
segments contained in the executive. 


Clock-interrupt routine (XRTI, listing 3) 

This program is the direct address to 
which the real-time clock interrupt is vec- 
tored; that is, the hardware interrupt 
caused by the real-time clock causes this 
routine to execute. It will first save all 
processor-status data (subroutine XUTO) 
and then search for poll groups that re- 
quire servicing (those with an actual 
timer value of zero). Upon finding a 
group that is ready, its poll timer will be 
reset. Those programs that the group 
points to will be scheduled by setting the 
appropriate bit (bit 6) in the job stack 
control word for that program. After 
scanning all poll tables that require ser- 
vice, XRTI will exit to the program XSCN. 
If any programs have been scheduled, a 
special rescan flag will be set. The pur- 
eh of the rescan flag will be explained 
ater. 


Priority-scanning routine (XSCN, listing 4) 1158 3578 


. . . . 115A 3572 € 
This routine is responsible for perform- ise 3574 seve runes Ceenepetsvered seace res oesseoteeenetees et esterere 
ing the priority decisions in the nga pa = PROGRAM - XSCN 
executive. It will scan the level-control lisa 3500 € 1) SCAMS LUL CTRL TALS 
tables, ne for a level requiring ser- ie Be : 2 ENTERS SSC OW PRIORITY BASIS WHEN LOL IS acrIATED 
5 ft 
vice. XSCN wi be enabled by the rescan LLSA 3612 CREORERESEERAOREREERORGERDEREEOEROEOOEEROS AGERE EEREOEEOSEES 
flag (which would have been set by XRTI, 115A 3614 # 
4 be 115A 3a 64 18 3626 XSCN: LD A,<KRSCD 
if any programs were scheduled) or will M50 7 338 0 8 : 
be cancelled if the flag is not set. 1166 3a 80 10 3658 rT) waa alas 
H 1163 CB 77 a 
Upon finding any level that has a task 1165 26 4€ sera Se nn IF THIS LUL IRPTO 
pending, priorities will be compared to ter 18 ae ee 
the current active level. This is normally 1189 sree $52) ta RESCAN XUS 
the level that was suspended at the time 1iGa 32 04 10 320 LD SCD.A RESET 
the interrupt was received in order to Lise 21 $8 18 3740 tb AGMET INIT XLT INDx 
determine whether to service the level ties 22 Ie 1 ec 8} Ue Sw 
immediately or return to the interrupted ties Fe FF see ae oa, 
rogram. XSCN always starts its scan at naar eG. Cece 
nel 0 and sequentially advances to the ize 21 08 10 3ei8 Mla 
lower levels, thus servicing the highest iat 77 3338 Lo CHUA 
levels first. When a level request is found hie3 77 See tO’ Ghiu>.a SET XAL.XLUL TNAcTU 
. tet . 1184 at 
and priorities allow, the XSCN routine lies 31 46 18 Jere XO LD «SP. XSTK 
will exit to XLSC, the next program to ex- nes Bet) ROP 
ecute, setting the proper pointers and Lhe er ee 
status data. eet aa¢ 
118u Zs 3938 7: HL 
Lisp zs 3940 mc HL =STAT ADDR 
Level-scanning routine (XLSCS, listing 5) igs be os 308 UR ORSeRSCL TF LUL SUS 
This program will scan one individual es 3308 ho koe IF Lu Reo 
level searching for programs that have reine EHSL BA 
been enabled by the clock-interrupt Liv 20 ue Ae wie | lr maain) 
routine. It is not concerned with priority ee ty 4033 ADD HL.DE ADU XLT INDX 
since it has already been given the go- lias aie eg 
ahead by XSCN to execute this level. If lias sero 
the “level interrupted’’ flag is set, then aaa ee ee 
program control will be returned to pede Mors Be EeRSo, APLC OBT INET 
whichever program was _ interrupted, MRE ae 4320 Wp 8a 
. . w = 5 > 
after restoring all status data. If the flag is libs be OF “06 
not set, XLSC will scan this level’s job Liss 38 3a ties ge Cetse IF CURR ACTY LUL HAS LOWER PRIORITY 
stack for enabled programs and enter ue ALTA GEESE RRL TRO SM 
them with the stack pointer set at the 185 iss xr QU WORN RET TO IRPTO PROGRAMS 
level’s stack area. Eventual return from Lise 31 te 18 4200 Lo SPexSTK 
the programs will be vectored back to necro a220 nm 
XLSC, which will continue to execute the lies ga 64 8 doe ID accxRSC) 
enabled programs one by one until they ties Ge oy a tee ry Rz.xSc2 IF NERC WaS THE ONE IRPTO, RSC xsS 
have all been run. When the level is com- Wes 38 oe 4270 a ne 
plete, it will return to XSCN, which will ACC £608 aoe eee a 
Lae 3 ; 
find the next level to scan on a priority HDL LD 40 12 4310 CALL XLAD 
basis, and in turn come back to XLSC, to {ios 23 “3.0 TN He 
scan the next level, again using the XLSC uo S oe eee 
4 o st 
routine. : } {iba 23 sre Ne HE 
In this manner, the executive works its 3108 Se <n Be 
way from the highest level to the lowest, 1100 56 aa0e LD Docu) DENS 
Completing all programs on each level. If 110F F9 4aze (0 SPL 
itis in the middle of a scan and new pro- ten eL deve POP HL 
8fams become enabled at higher levels Hever bry to ee 
(which have already been scanned), the ME 09 are co 
Current scan is aborted and a new one is HE? 00 Et — vor. 1x 
begun at the highest priority level. This Liga DL 518 poe oe 
action Is initiated by the rescan flag, LEC FL 4538 POP AF 
Which is monitored at each interrupt. eas ie or ace ha Arid eee 
Eventually, the executive will complete REE pees 
4 ipa and have no more work to do. 
ho time it be go into the executive Listing 4: The priority-scanning routine, XSCN. This routine performs the 
peration”’ loop and wait for an in- priority decisions in the executive. It will scan the level control tables 


terrupt to request more tasks. The job and, ona priority basis, find the level that is the next to be serviced. 


a 15 


loading, as mentioned in previous sec- 
tions, is directly related to how much 
time the processor spends in this loop. 


Miscellaneous Routines 

XEXC (listing 6) — Routine that causes a 
normal return to the executive. Return is 
accomplished by a program jumping to 
the beginning of the routine when it has 


ier 4590 5 

IEF 4582 & 

LLEF 4534 SNOKRAAANAAERSON EON ERANOGEROMERAHERORRERORE 
LLeF 4586 & 

1LEF 4588 & PROGRAM ~ XLSC 

LLeEF 4589 # 

1LEF 4590 # 1) SCANS 1 LUL, XTHP = LUL ADDR 
LLEF 4600 & 2) ENTERS PGS NHEN ENABLED. AND SCHL'D 
Ler 4618 # 3) EXITS TO XSCK WHEN LUL COMPLETE 
MEF 4620 ¢ 

LLEF 4623 EOERRRRENRAENEEAHERDEREGEOHEEHEHEORRORSEEOR 
LLEF 4627 & 

L1EF 2A 1€ 16 4630 KLSC: LO HL,<XTHP> GET LUL ADDR 
L1F2 SE 4640 LD ECHL 

LLF3 23 4650 INC HL 

LiF 56 4668 LO D.cHLD 

LiFS 23 4670 INC HL . 

1LF6 ED 53 62 16 4638 LD CXPGM),DE SAU KJS INDEX 
LiFA CB BE 4636 RES 7,(HL) LUL REQ 

LLFC CB 76 4700 BIT 6.<HL) 

LLFE 26 38 4710 JROONZ.L: IF THIS LUL WAS IRPT'D 
1266 2A 2 16 4720 XLS LD HL,<XPGM) 

1283 7E 4730 LD A,CHL =PGH STAT WD 

1204 23 4740 TNC HL 

1265 SE 4750 Lo E.<HL> 

1206 23 4760 TNC HL 

1267 56 4778 LD 0.<HLD 

1268 23 4786 INC HL 

1269 22 @2 10 4798 LD CKPGHD»HL 

1200 FE FF 4880 cp o-t 

126E CA 96 LL 481¢ JP) Z.XSCL IF END LUL 
1z11 17 4820 RLA 

1212 36 EC 4838 JR HC,XLSL NOT ENABLED 
iz14 17 4840 RLA 

1215 36 £9 4856 JR NC,XLSL NOT SCHED 

1217 28 4860 DEC HL 

1218 28 4876 DEC HL 

1219 28 4888 DEC HL 

121A CB BS 43968 RES 6,(HL) PGN SCHED BIT 
121C 2A LE 16 4980 LD HL,<XTHP) 

121F 23 4918 INC HL 

1228 23 4926 INC HL 

1221 23 4938 INC HL 

1222 4E 4948 Lo C.CHLD 

1223 23 4958 INC HL 

1224 46 4968 LD B.CHLD 

1225 69 4970 wD Lc 

1226 66 4986 LD H.B — HL=STACK ADDR 
1227 F9 4998 LO SP.HL 

1228 EB 5008 EX DE-HL 

1229 F3 5018 or 

122A 3a 05 10 5628 LD A,CRMED 

122D Fé 68 5036 OR XNSK 

122F D3 02 5048 GUT @2 

1231 3a 1 16 5056 LD ACXLULD 

1234 F6 86 5e60 OR 688 

1236 32 66 1@ 507¢ LO (RAL).A SET THIS LUL CURRENT.AND ACTU 
1239 FB 5038 el 

123A £9 50396 JP CHL) 

1238 $108 6 

1238 CB GE S110 1: BIT 5,<HL) 

1230 C2 96 LL 5120 JP ONZ,XSCL —SKIP_IF_LUL SUSP, 
1246 Lb Be 5138 RES 6,CHL) —LUL IRPTO 
124e LH re 5140 SET 7,(HL) LUL REQ 

1244 3A 61 16 5156 LO ALCXLUL) 

1247 32 68 1¢ 5160 LD (XAL),A SET THIS SCAN LUL ACTU 
t24a C3 5 LL 5178 JP XRTN 

tap 5160 # 

1zav 5190 & 

ta 5206 # SUBR XLAD, CALC XLT ADOR FM ACLULED 
1240 S216 & 

Lead £6 oF 5220 XLAD AND GF = B-3 

La4F 17 5230 RLA 

1258 17 5240 RLA 

1251 17 5250 RLA 48 

1252 SF 5260 Lo EA 

1253 16 oy 5276 Lo D8 

1259 21 68 10 5280 LD HLLXLT 

12d8 19 deve AOD HL,DE 

1259 cg 5308 RET 

1zoa 5316 & 


Listing 5: The level-scanning routine, XLSC. This routine does the actual 
scanning of any particular level. It is entered from XSCN, which has 
already determined priority and authorized XLSC to scan a given level. 
Depending on the status of the level, this routine will either enter the 
level (if scheduled), skip the level (if suspended), or reenter the level (if 
interrupted). 


16 


finished its task. XEXC reads the executive 
status flags and reenters the executive at 
the appropriate place, depending upon 
whether there is a level scan active, a 
rescan requested, or no levels active. 


XIN (listing 6) — Cold-start initialization 
routine. This routine resets all flags and 
pointers for a safe entry into the ex- 
ecutive, It also initializes the Z80 inter- 
rupt register and writes the proper inter- 
rupt mask to the hardware. This is the 
only place the executive should be 
entered from a cold start or after linking 
an object program into the executive. 


XUTO (listing 7) — Executive 
subroutine for saving the working 
registers, processor status data, and stack 
pointer for any particular level. This 
subroutine normally is used by the inter- 
rupt service routines to save status data at 
the interrupt level. On entry, the Z80 
registers contain the data to be saved, 
and the pointer XAL contains the level 
under which it is to be saved. The routine 
returns with all registers and stack pointer 
saved, level flags set, and the stack 
pointer set to the executive stack. After 
calling this routine, the user program is 
then free to alter any registers or stack 
data without destroying any status data of 
the interrupted program. 


XENP (listing 7) — Enable-program 
routine. This routine allows a user pro- 
gram to access the job stack and 
schedule programs on demand, indepen- 
dent of the clock routine. Thus, programs 
may be enabled to execute based on ex- 
ternal events rather than through the 
scheduler. On entry, the first call 
parameter of the calling routine contains 
the level and program number (identical 
to the poll table format) to be enabled. 
Naturally, when enabling a program 
through XENP, the status data of the cur- 
rent program is saved and a priority com- 
parison is done such that priority is main- 
tained, even in this mode of scheduling. 
In this way, a program that is running on 
a low level may enable a higher-level pro- 
gram, resulting in the higher-level pro- 
gram running and completing before 
returning from the XENP call to the 
lower-level caller. This feature is ab- 
solutely necessary in some instances, 
although not always critical — all in all, a 
good feature. 


XEN2 (listing 7) — Similar to XENP, ex- 
cept that interrupt status is not saved and 
return is always made immediately to the 
caller. This is generally used by routines 


ie; 


that are running at the interrupt level, 
having already suspended the inter- 
rupted program, saved the status data, 
and taken priority from the executive. 


Efficiency 

Does the executive load down the pro- 
cessor considerably with its priority 
scheduling? How much time does it con- 
sume? Is it efficient? In approximating the 
executive's efficiency by summing up the 
execution times of the various loops re- 

uired to schedule and enter programs, 
the following can be calculated: 

Assuming an executive with eight 
levels, at four programs per level (thirty- 
two total), using the Z80 (2.5 MHz): 


Clock routine, poll table 


scanning 2.7 ms 

Priority level scanning 

and execution 5.5 ms 
TOTAL 8.2ms 


These figures may vary slightly depend- 
ing on the exact arrangement of job stack 
and poll tables. The total time is 8.2 ms; 
for thirty-two programs scheduled once 
per second, the executive loading is 0.82 
percent, and for thirty-two programs 
scheduled at 100 ms, the loading is 8.2 
percent. 

Depending on the poll intervals, a 
system with thirty-two programs will have 
an executive loading somewhere be- 
tween the above two numbers. (Typi- 
cally, the result should be very close to 
0.82 percent, since in a typical home 
system, very few programs will be re- 
quired to run at ten times per second. 
Also the total number of programs would 
be much less than thirty-two, giving a 
direct proportional reduction in 
overhead. Thus, the above figures can be 
considered as a maximum for most home 
computer systems. 


Implementation Examples 

Now that we have the basic real-time 
system to handle programs on a priority 
basis, we must give the system some 
Practical tasks to perform. This section 
will describe the interface to the ex- 
€cutive using general examples of the 
Most common programs and peripherals 
which may be found on a typical home 
computer system. 

The objective of this executive design, 
as far as software is concerned, is to be 
able to easily interface new or existing 
Programs without having to write special 
Coding to do so. From the hardware point 
of view, the general interface philosophy 
is to parallel the previous ready, strobe, 
Or status lines from the peripheral devices 


as interrupt inputs to the hardware priori- 
ty chain. The already existing input and 
Output ports to these devices are still 
used to communicate to the real-time 
system, as well as to previous systems, 
with no changes required other than the 
loading of different software. 


1258 5326 5 

125A 5322 6 

Lesa Prom 

1258 5326 6 

1254 5330 ¢ WORMAL RET TO EXEC 

125n 5340 8 

125A S345 Geeeeeaeenennenenenagnenes 

1250 5347 # 

1250 FS 5350 XEXC: DI 

1256 38 66 16 5360 LD A,CKALD 

125€ CB 7F 5378 BIT 7A 

1z6¢ Ca 84 LL 5380 JP 2, NOP IF HO LUL ACTU 
1263 Le BF 5398 RES 7A 

1265 32 O8 lw 5400 LD CXALD-A SET IWACTU 
1268 31 16 Ww S410 LO SP. XSTK 

1268 FB 5426 13 

126U sH v4 lw 5430 LO ALCXRSC) 

Ader we 9490 AND A 

lero ue oF LL 5456 JP ONZ.XSC2 IF PGAS REQ WHILE AWAY 
1273 3 ol lw 5460 LD ALCXLULD 

127 Ub ¢F 5476 BIT 7.8 

1278 CA be LL 5488 JP Z.XNOP IF SCAN NOT ACTU 
1278 3 F2 ab 5490 JP XLSC+3. CONT SCAN 
1e7e 5500 # 

lave SSi0.# = - STACK AREA ~ 

127€ 5526 6 

Le7eé 5536 oS 46 LONGER STK FOR KB RTNS 
Len 5548 STO QU 

1206 5550 ¢ 

12A6 S566 oS 32 

12c6 5970 XSTL EQU 

1206 5580 # 

12c6 5596 os 32 

12e6 S606 X5T2 EUS 

1266 S610 ¢ 

12E6 5626 # 

126 5630 # 

1Z6 5646 eee - NOTE XST-3.4,5.6.7 PLACED IN PAGE 4 
1266 5656 & 

12€6 5666 5 

1266 5662 # 

1266 S665 HAAREREDERERRROAKORARRARAE ROR EHERS 
1266 S670 ® PROGRAM - XIN 

1ze5 5672 # COLO START INITILIZATION ROUTINE 
1266 5674 & 

12E6 S676 MORON ORAERARRORAORORAREREAEORSES 
1266 5678 * 

1266 F3 ‘5636 XIN OL 

127 we 5698 KOR A 

12k6 Z1 06 18 5708 LO HL KAL 

1268 ¢¢ S718 LD <HL>A 

LeeC 23 5726 INC HL 

12Eb 77 5738 LO <HL)A LUE 

L2EE 23 5748 INC HL 

AZEF 23 5756 mc HL 

12F8 77 5768 LO <HLDA RSC 

12Fi EO SE 5776 In 2 

L2F3 2. FO 13 5786 LO HL. XIUT RPT TABLE 
12F6 7C 5796 LD AH 

12F? ED 47 5800 to) LA SET IRPT PAGE @ 
L2F9 3H 6S 18 5816 LO ALK XNE) 

12FC F6 68 5828 OR = XNSK SET IRPT WASK 
12FE D3 62 5838 OUT @2 

1300 21 68 18 5846 LO HLLXLT STRT AT TOP OF LULS 
1303 7E 5856 1: LD ALKHLD 

1304 FE FF 5868 cp =k 

1306 CA SA 12 5876 JP 2, KEXC DONE 

1369 23 5830 INC HL 

136A 23 5690 INC HL 

1308 AF 5908 xOR A 

138C 77 5916 LD CHLDA CLR LUL STATUS 
1300 11 06 66 5926 0E.6 

1318 19 5930 ADD HL.DE ADY TO NXT 
131L 18 FO 5946 Rod 

1313 5956 & 


Listing 6: The normal-return and cold-start routines, XEXC and XIN. 
XEXC is the normal return to the executive after any program has finish- 
ed execution. This routine reads the executive status and enters the ex- 
ecutive at the proper point. XIN is the cold-start initialization routine, 
which is the only point from which the executive should be entered 
after loading the executive into memory or after running other software. 
It initializes pointers and flags in the executive, insuring a safe execu- 


tion. 


17 


1313 

1313 

1313 Fs. 

1314 22 GE 13 
1317 EL 

1318 22 76 13 
1316 Zn 6E 13 
ASLE FS 

131F 21 wo 18 
1322 CB 7E 
1324 Cb 6E 
1326 20 6C 
1328 3E FF 
132n 32 v4 10 
1320 31 1E 16 
1338 2a 76 13 
1333 £9 

1334 Cb Fe 
1330 Zh e& 13 
1339 CS 

133% DS 

1336 ES 

133 Ob ES 
133€ FO ES 
1346 09 

1g4h vd 

1342 0S 

1343 £5 

1sa4 vy 

134> 38 88 16 
1348 CD 40 12 
1348 3A 65 16 
134E Fe 68 
1358 03 62 
13D2 23 

1353 23 

1334 CB Fe 
1356 44 

1357 40 

1358 23 

1359 23 

1350 23 

1358 €6 

135¢ 21 66 Oo 
135F 39 

13668 &6 

1361 73 


1364 31 1E 18 
A3er 2h eo 13 


G 
3 
8 
a 
Goa 


138F 23 

1398 CB FE 
1392 21 83 13 
1395 ED 6F 
1397 E6 OF 
1399 47 

1398 17 


A3A3 32 83 13 
L3Ae Cb 84 13 
13A3 SE FF 
136 32 04 16 
ASAE C9 

130F 


18 


S966 . EAC UTIL. SAUS STAT & SP, SETS LUL IRPTD 


5376 ¢ 

5986 xuTe 

5936 41H THP SAU 

ecee 

6610 C2) DoH SAU AUT@ RET 

6626 hd RESTORE HL 

6638 

6046 HL, KAL 

6856 2.¢HL) 

6666 HUD 

ee7c NZ.3 IF Luu Act 

6680 t 

6636 LD CRRSC).A 

6168 LD SP, XSTK 

6116 LO HLAC2> 

6126 JP CHL Back TO CALLER 

6136 3: SET 6.<HLD LUL IRPTD 

6146 tO HLAKhs) RESTORE 

6156 PUSH BC CONT, SAU REG 

6160 PUSH DE 

6178 PUSH HL 

6186 @ PUSH 1X 

6196 PUSH IY 

6208 ‘EXX ‘SAU AUX REG 

6218 6c 

6228 PUSH DE 

6238 PUSH HL 

6248 ExX 

6256 LO AC KALD 

6266 CALL HL=LUL ADOR 

6276 LD A,CXMED 

6288 OR 

6298 OUT e2 SET XEXC MSKYEXT BITS 

6388 IWC HL 

6318 INC 

6328 SET 6.CHLD LUL IRPTD 

6338 uw 8B, 

6346 tO Co 

6358 INC HL 

6368 INC HL 

6378 INC HL 

6388 €% DE. HL 

6398 LD HLe 

6406 AOD HL, SP 

6410 EX DE. HL DE=SP,HL=SAY ADOR 

6428 LO (HLDE 

6438 wc HL 

6440 LO (HLD.O 

6456 LD SP,KSTK 

6468 LO HL.C2:) 

478 PUSH HL RET 

6436 ib WB 

6498 w LC HL=LCW ADDR 
RET BACK TO CALLER 
ow 6 TWP HL SAY 
ou 6 RET SAU 


HNeREERERE NER EREEeEEsenbenEReEsEE FEES 
1) XENP-EWABLE PCr RTM 


0B GLP LaLULe. P=PGNE 
> XENL (SAME AS XENP, BUT A HAS OLP 


eeeeneeenenene 


Adu RET 


SET LUL IRPT SAU STAT 
SET PGh BITS 


RESCAN XJS- 
THP SAU FOR OLP 
‘SUBR SETS PGB ENBL BITS 


ALLE 
GET LUL ADDR 


DE=KJS AOOR 
LUL REG 


A=PGHE 
rd 

8 

HL=DISPL TO PGA 
SET PG SCHLO 


SAU GLP 


RESCH XS 


xTLE 1600 
XRTN LLBS 
KSTL 1206 
XENZ L303 
KIUT 1368 


ENP, XENL EMABLE AT PCM LUL.d RET TO CALER ON PRIG BASIS 
3) = XEN2,SAME AS AENL-BUT RET LMicO, FOR IRPT LL SERUCD 


7518 5 


7512 SHEPSRREREERSOReReREeeEEeEEeeNeeeEeceteeseetteteEeeeeteR 


7514 


‘ 
7520 * 280 IRPT VECTOR TABLE «STARTS AT END OF PAGE BOUNDRY) 
7536 € 
7535 OHOReOSRCeReeeeeEeneEReAneeteseeteteteonneeeneeeeneR 


7538 8 
7546 ORG XAL+G3FG TOP OF PAGE 3 
755@ xIvT «EU $ 
XIN 8 
XIN 1 
XIN 2 
HRT atc 
XIN 4 
RIN s 
u KBIC KYBD 6 
u KBIL KYBO 1 


7756 see6 - REST OF XEXC STACK - 
766 © 


7 

32 

s 

+ 

32 

’ 

4 FOR KB RTHS 
’ 

4“ FOR KB RTNS 
’ 


XRSC 1064 XNSK 6668 
XJS 1620 LT 1688 XT@L 1669 
xSC2 1169 XMOP 1184 XSCL L196 
LAD 1240 XEXC 1250 RSTO 126 
KUT 1313 REWP 1372 RENL 1376 
XSUL 136C XREN 1304 KREL 1308 
xSTS 1446 RST6 1468 XST7 1498 


Listing 7: Enable-program and save- 
registers routines, XENP and XUTO, and 
the Z80 mode-2 interrupt table, XIVT. 
Routine XENP is an enable program 
routine that allows user programs to in- 
dex the job stack and thereby enable pro- 
gram execution regardless of the real- 
time clock scheduler. Priority is main- 
tained by this routine. XUTO saves the 
status data of an interrupted program. 
The level indicator variable XAL points to 
the level for which the status data will be 
saved. XIVT is the Z80 mode-2 interrupt- 
vector table. It contains the addresses of 
the direct-interrupt service routines link- 
ed to each interrupt line. 


_ 


REAL-TIME CLOCK INTERRUPT See Figure 6 


+ Set Level Pointer = O 
Rescan Request Start New Scan 
Level Active + (0,1) 
(0,1) No 
Normal Return + Rescan Request Reenter 
(0,1) “'XLSC’’, Continue Scan 

To Executive Level Not No-Operation Loop 

Active 

(0,1) 


Set level O, Inactive (1,1) 

Reset Rescan Flag (1,1) 
Initialization Initialize IRPT Register (1,1) 

Update Hardware IRPT Mask (1,1) 

Clear Level Status Word (1,1) 

Exit to ‘‘XEXC’’ (1,1) 


Save Status (KUTO) Debug in use 
(1,1) (0,1) 

Read Keyboard 

Save Character 
(1,1) 


Character = ‘‘CTRLP”’ 
(0,1) 
Keyboard(N) Debug notin use Enable Debug Program 
Interrupt 
(0,1) Skip 
Reset Keyboard request (1,1) 
Keyboard 
Input Index to requesting 
Request Level number (1,1) 
(0,1) Reset level 
Character # ‘‘CTRLP’’ Suspended Flag (1,1) 
(0,1) 
Rescan Job stack (1,1) 
(XSCN) 
No 
Keyboard Return to 
Input Interrupted 
Request Program 
(0,1) 


Figure 5: A Warnier-Orr diagram of the real-time executive program structure. This diagram shows the three possible 

namic entry points to the executive and an example of a common peripheral interrupt device (keyboard). The “nor- 

mal return to executive” entry at top is from programs that have finished executing and are returning program control 
to the executive. 


ial 19 


Save Index Set Rescan Request 
Interrupt Program pointer 
Status (1,p +1) 
(1,1) Schedule program (p,p) 
Advance Not 
Poll Timers End, Poll 
Group 
(1,1) (1,1) Next p (p,p) 
Poll timer 
expired + 
(0,1) 
+ End Poll 
Group 
None (1,p) Skip 
Expired (1,1) Skip 
Initialize Level suspended Next L 
Level Index (O,L) 
(1,1) Current Reenter 
Hit | Active Current 
WN | Real + Level Active 
Time Rescan Higher Level 
Clock Request Level Request Level Interrupted + 
Interrupt (0,1) (1,L) (O,L) Restore 
Not This, 
\ Higher Levels 
/ Status 
\)| + Re-enter 
| + this 
: Level 
{Hl Program Set new 
| enabled stack 
+ Level Ready (1,p) (1,1) 
Execute 
+ new 
program 
No 
Level Request Next L Program 
(1,L) not NEXT 
enabled p 
No Level active Reenter interrupted program 
MH Rescan Now 
Request (0,1) (0,1) 
(0,1) 
+ 
| No level active No-operation loop 
now 
(0,1) 
i} 
Figure 6: A Warnier-Orr diagram of the real-time clock interrupt structure. This diagram shows the general sequence of 
events occurring within or subsequent to the real-time clock interrupt. (This figure is a continuation of figure 5.) 


20 


Interfacing keyboards 
Hardware: 

Figure 7 presents two variations of a 
typical keyboard/computer interface. 
The keyboard will be connected to the 
processor I/O bus and read through a 
typical input instruction. The lower bits of 
the port will have the ASCII code for any 
key, and one of the high-order bits 
(usually the most significant bit) will be 
the key-pressed strobe signal. Alternative- 
ly, for remote keyboards there may be a 
serial interface from a remote station to 
the computer, with a serial receiver 
(usually a UART) connected to the I/O 
bus or buffered through an I/O port. In 
this case, the interface is similar except 
that the data-available status signal of the 
serial receiver is used to generate the key- 
pressed strobe, which is also connected 
to the most significant bit of the input 
port. It is very easy to interface these 
keyboards to the real-time system. The 
hardware configuration is left exactly as is 
(thereby not interfering with the original 


fa) PARALLEL INTERFACE 


KEYBOARD 
INPUT 


PORT 


(bo) SERIAL INTERFACE 


KEYBOARD SERIAL 


PARALLEL 


TRANSMITTER 


DATA STROBE 


mode of operation), and an extra line is 
added from the strobe bit to the 
interrupt-request latch, thus paralleling 
the most significant bit of the input port. 
The strobe signal on some systems may 
need further conditioning before being 
presented to the interrupt-request latch. 
This, however, depends upon the hard- 
ware setup used to set and reset the inter- 
rupt request. This arrangement allows the 
same keyboard to be used in either mode 
of operation, without need of any hard- 
ware changes to switch from one mode 
to another. Batch-mode operation will be 
required to run with interrupts disabled; 
otherwise, the keyboard-interrupt line 
will have to be disconnected for this 
mode. 


Software: 

The appropriate software interface to 
the keyboards shown in figure 7 would 
be a subroutine that accepts inputs from 
either keyboard on the system. Usually 


PROCESSOR 


DATA BUS 


ADDRESS/READ 


TO INTERRUPT REQUEST LATCH 


SERIAL LINE 


SERIAL 
RECEIVER 


DATA AVAILABLE 


ADDRESS/ 
READ 


TO INTERRUPT REQUEST LATCH 
PROCESSOR 


Figure 7: Two block diagrams showing possible keyboard interfaces to the real-time executive. In both the parallel (a) 
and serial (b) cases, the keyboard can be interfaced to the executive using the already existing key-pressed or data- 
available signals to generate the keyboard interrupt. Paralleling these signals allows the same hardware configuration 
to be used for both real-time and batch modes of operation. 


21 


162€ 

162 

162 

162E 

162€ 

162 

162 

162€ 

162€ 

162€ 

162€ 

162E CL 

162F 6A 

1630 63 

1631 CS 

1632 @L 64 16 
1635 £6 63 
1637 67 

1638 81 

1639 4F 

1634 36 61 
163¢ 64 

1630 CS 

AG3E 63 

163F 6A 

1640 17 

1641 38 19 
1643 68 

1644 3A 68 10 
1647 CB FF 
1649 62 

164A 61 SC 16 
1640 CS. 

164E CD 13 13 
1651 CB EE 
1653 21 68 16 
1656 CB B6 
1658 FB 

1659 C3 96 LL 
165C 

165¢ 

165C CL 


3420 5 1666 
3425 SeHbONERREREORE ROR OeeEOteeteeRenESEES 1666 68 
3436 & SUBR, IP FM RVBDS ON REHL TIME BASIS 1667 €8 
3440 & 166 
3456 # -USE. CALL INKB 166c 
3460 # bB N NsKYBO® 166 
3476 # 1é6¢ 
3486 # -OR! CALL INKL ASKBDE 166C 
3430 # 166C 
3506 # USES A. BC 16st 
3516 seeenneeeneeneenensenenssenecensseeanse 166C 
3528 INKB: POP 6C 166¢ 
3536 Lb AxBC> Ba’ 1e6e 
3548 INC BC 166C CD 13 13 
3558 PUSH BC ADU RTRN 166F 3E 68 
3568 INKL LO BC, KB TBLS 1671 GE 68 
3576 AND @3 1673 
3588 RLCA 02 1673 21 64 16 
3598 Abd ¢ 1676 E6 @3 
3668 lo Ca 1678 6? 
3618 JRC. $48 1679 85 
3628 INC 8 ADD A TO BC 167A 6F 
3638 PUSH BC SAU TBL ADDR 1678 36 aL 
3648 INC BC 1670 24 
3658 LO A.CBCD 167E EO 78 
3668 RLA TST 67 1686 23 
3678 SR CAE IF CHAR READY HOW 1681 C8 FF 
3698 DEC BC ELSE RCL INDX 1683 77 
3698 LO ALCKALD 1684 28 
3768 SET 7.8 SET ACTU 1685 FE 98 
3718 LD (BC). A SET ENBL,AND USER LUL® 1687 26 6D 
3720 Lo BCA BC=IRPT RET ADOR 1689 3h 8 e@ 
3738 PUSH BC 168¢ A? 
3748 CALL XUT@ FORCE LUL IRPTD 1680 28 67 
3758 SET 5,<HL> SUSP. LUL 168F ES 
3768 LD HL, XAL 1690 3€ 61 
3778 RES 6.CHL) —DE-ACTU CUR LUL 1692 CO a3 13 
3788 er 1695 E1 
3798 JP XSCL ON TO NXT LUL IN XJS 1696 CB 7E 
3300 # 1698 26 63 
3816 a: EWU IRPT RET ADDR 169A C3 BS At 
3826 POP BC TBL ADDR 1690 
sao cata tesr fe 

LD A,(BC) vf) 
300 PUSH AF ae 16Ae CO 4D 12 
3868 xOR 16A3 23 
3878 LO <BC).A RESET CHAR RCU FLG 16A4 23 
3880 POP AF RCL CHAR 16A5 CB AE 
3898 RET BACK TO CALLER 16A7 FB 
3908 & 16A8 C3 69 11 
3910 # -KYBD TBLS.2 BYTES PER TBL 16AB 
3920 # 16A8 
3936 INKT EQU $ 1608 
3348 6. EQ + TBLS STRT 16AB 3E Ot 
3958 oB @ KBG,LUL FLO 16AD @E @5 
3968 0B 6 CHAR SAU ood 18 C2 


3976 & 


3988 1: oe 6 KBL,LUL FLG 

3998 oe @ CHAR SAU 

4630 

4031 #AOSREREEOAEOREERERERRA CEE EEaEeOEeEeE ERD endened 
4032 ¢ 

4033 @ OIRECT INTERRUPT HAHDLER FOR ALL KEVBOWRDS 16: ) 


4034 # = KEVEGHRD 0 - KEIO 


4035 # KEYBOARD i - KBIL 
4036 # 
4037 HEREARDEREEER EDR OR EE EREEREAA ARE ANAA EEE REAR RAKE AEE 
4038 6 
4046 KBIG «=EQU S$ KBG DIRECT IRPT HANDLER 
4256 CALL XUTO ‘SwV STHT 
4660 wo AO 
4070 Lo C6 AeKBS , C=0A 
4686 8. EQuU ¢ COMNGH ENTRY FOR ALL KYBDS 
4698 LO HL.6 
4106 AND 6. NSk KBD @ 
4116 RLCA 2 
4126 Ado tL 
41368 to LA 
4148 JR ONC. SOL 
4156 INC 
4166 IH (CC). READ CHAR 
IWC HL 
SET 7.8 SET STRB IN CASE MISSED 
LD (HLA SAU CHAR 
DEC HL 
cP 8690 CHK CTRL P 
SR ONZ,7* 
LO = AL<XDBF) 
AND A CHK FOR DBUG IN USE 
JR N27 
PUSH HL ‘SAU INDE 
Lo AOL LO.PL 
4280 CALL XEN2 ENABLE DBG PGH 
4298 POP HL RCL INDK 
4308 7: BIT 7.<HL> 
4316 JR NZ,9> IF KD ENBLD 
4326 JP XRTW BACK TP PGM IF HOT 
4336 & 
4348 9 RES 7,¢HLD ACK CHAR RCUD 
4358 LD ALCHLD LL @ 
4366 CALL XLAD GET USER LUL® 
4378 INC HL 
4386 INC HL 
4398 RES 5S.<HL> LUL SUSP 
4468 135 
4416 JP xSC2 00 KJS RSCH 
4426 & 
4436 * 
4440 KBIL EQU $ KBL DIRCT IRPT HANDLER 
4456 LO Ae 
4466 to «6C.65 
4476 RB ENTER RTW 
4488 & 


Listing 8: The real-time keyboard input routine. This routine is given in two parts. The first part, INKB, suspends the cur- 
rent level and enables the keyboard input. The second part, KBIO, is the direct-interrupt handler for the keyboards. 
This routine reads the keyboard, saves the character, and eventually returns to INKB ona priority basis through the ex- 


ecutive. 


this subroutine is called with a register 
pointing at the keyboard number or the 
device address of the keyboard to be 
read. When a key is pressed, the routine 
returns with a register or memory loca- 
tion containing the ASCII code for the 
key pressed. These subroutines work by 
reading the status of the strobe bit, loop- 
ing until true, and returning with a 
register set to the value read last. Usually, 
for well-structured software, there will be 
only one such routine that all higher pro- 
grams use. Therefore, to make a simple 
interface of new or existing software to 
the real-time executive, we create a 
similar routine that functionally replaces 
it. All calls in the existing software to the 
old routine would be replaced by calls to 
the real-time keyboard routine. The 
higher-level programs can therefore run 
the software with no changes required. 
These programs will be unaware that the 
keyboard routine has been changed. 
We cannot create a real-time pro- 
gram that loops on a keyboard strobe. It 


would waste too much time, or it could 
be interrupted and miss the strobe signal. 
How, then, do we handle a keyboard in- 
put on a real-time basis? 

Listing 8 shows the keyboard program 
and its associated interrupt service 
routine, which is functionally identical to 
the looping keyboard input routine. This 
routine handles any number of 
keyboards and users “simultaneously’”’ 
on a real-time basis. It always returns to 
the proper user with the received ASCII 
key and with all registers and status data 
restored (except for the accumulator, 
which contains the received key, and the 
BC register pair, which contains the 
address of the keyboard table). 

This routine is split into two parts. The 
first part, INKB, indexes the appropriate 
keyboard table, sets enable flags in this 
table, saves all status data, and then 
suspends the current level. Program con- 
trol is then given to the executive, which 
will continue with the other levels on the 
system, bypassing the level that is re- 


questing the key; this last level stays in 
the suspended state. Thus, the program 
that called the keyboard routine will also 
be suspended, just as if it were looping 
on a key-strobe signal. 

The second part, KBIO, is the actual 
keyboard interrupt service routine. This 
routine saves the status of the program 
that was interrupted when the key was 

sed, indexes its own appropriate 
board table, and checks to see which 
level is requesting a key from this 
keyboard. If no levels have made a re- 
, then the routine simply returns to 
interrupted program. If some level is 
waiting on this keyboard, the keyboard 
will be read, the input-key data will be 
saved in the table, the requesting level 
will be reenabled, and then the routine 
will be exited to a rescan of the executive 
levels. The rescan will eventually result 
{as priorities allow) in the original 
suspended key-requesting level being 
ered with all status data restored. 
entry is back to the routine INKB, 
h then loads the saved key from the 
table and returns to the caller. The calling 
am continues on, unaware that this 
4s Occurred and is content with the key 
just received. 
additional note: this particular in- 
pt routine has an extra bit of coding 
is used to enable the executive 
program upon receiving a Control- 
is function is done at the interrupt 
al to give an absolute priority to the 
jug program (thus providing a type of 
ic button’ recovery from any 
tine without disturbing the real-time 
em). This coding does not interfere 
the normal keyboard-input function 
ind has other uses that will be discussed 


rround/Background Programs 
S$ shown in the previous section, we 
have a method to use many of our 
ing programs on a real-time basis by 
lementing a real-time keyboard 
ine, which is the functional equiva- 
Of previous batch-type keyboard 
tines. In addition to this I/O interface, 
Program itself must be linked to the 
Cutive and given a priority. 
ere are two types of programs 
ussed in this section, referred to here 
. eground and background programs. 
iS used here, foreground refers to pro- 
is that are continuously enabled and 
tunning on a regular basis. 
times referred to as polled or cyclic 
Wograms, these types will perform their 
iSks at regular intervals through the 
uler. They normally will not be 


associated with external devices such as 
keyboards or peripherals which require 
status looping or interrupts. Examples of 
these programs (from the system des- 
cribed earlier) are: security monitoring, 
analog input monitoring, timekeeping, 
etc. The linking of these programs is sim- 
ple and straightforward, merely requiring 
the user to assign a priority level to the 
program, enter its starting address in the 
job stack at that level (make sure that the 
exit point of the program is to XEXC of the 
executive), assign a poll time in one of 
the poll tables, and set the enable bit in 
the job stack. 

Background programs are referred to 
here as those that may be required to be 
suspended (eg: waiting for a peripheral) 
or simply disabled for a period of time. 
They may also be those that are of. lesser 
importance in terms of execution speed, 
and perhaps those that take considerable 
time themselves to execute. An example 
would be the BASIC calculation des- 
cribed earlier. 

Background programs must also be 
given a priority and placed in the job 
stack. However, poll times are not given 
to these programs, since they are not to 
be scheduled regularly. Examples of 
these are: disk or tape operating systems 
that handle programs such as high-level 
languages, text editors, assemblers, etc; 
keyboard input utilities such as system 
debugging; or other programs that ser- 
vice peripherals and operate in an 
unscheduled manner. 

One necessary requirement of 
background programs is a means for 
them to be initially enabled and eventual- 
ly disabled when completed. This must 
be done by a user-initiated action that 
causes the execution of a routine that 
eventually enables the program. A 
special button that activates the program 
request by being polled or by causing an 
interrupt can be used for this operation, 
although a more logical device is the 
keyboard. Use of the keyboard, 
however, requires special provisions, 
since it is also used for normal operator 
communication and data entry, which 
must not be confused with function 
selection. A good example of how to im- 
plement this feature is shown in listing 8, 
where the Control-P is used to enable the 
debug program, which is a background 
executive utility. 

By implementing such a scheme at the 
interrupt level, global priority may be 
given to a keyboard to enable or disable 
background programs, regardless of 
whether the keyboard is or is not current- 
ly servicing a function. A proposed 


23 


scheme is to extend the example of listing 
8 and use a separate control key for each 
major background program on the 
system, thus allowing users at different 
terminals to enable and use_ these 
background programs (provided they are 
not already in use at another terminal). 


Priority Assigning 

Now that the two major types of pro- 
grams have been identified, how do we 
determine what priority to assign these 
programs? Priority should be selected ac- 
cording to how the user wishes the 
system to operate, with respect to the 
priority rules presented earlier. Normally 
priority will not be extremely critical, 
unless the programs themselves consume 
a very large amount of time or are critical 
in respect to the execution interval. As an 
example, an assembler which consumes 
large amounts of time must be put in a 
low-priority region. However, also within 
this low-priority region are programs that 
may be simultaneously resident with the 
assembler. It must be determined which 
one should be completed before the 
others are allowed to run. Since they all 
have a relatively low priority, it is not a 
critical decision and is determined by the 
designer’s own preference, based on the 
average amount of time each program 
will take, as well as various other system 
configuration aspects. 

Foreground programs should always be 
put in the medium priority region above 
any background programs that are likely 
to consume a lot of time. These are the 
programs that are expected to execute 
once per second, for example. Medium 
priority foreground programs may not be 
executed at exactly the time interval 
specified by their poll time, since these 
levels are subject to being interrupted by 
higher levels. As a result, execution is 
delayed past the poll time by an amount 
equal to the total execution time of all the 
interrupting programs. 

Foreground programs required to run 
at exceedingly close tolerances to their 
poll time should be placed at the highest 
priority levels. Those foreground pro- 
grams performing operations on which 
lower-level programs rely should also be 
placed at the highest levels. For example, 
assume there is an analog input routine 
that reads a multiplexed analog-to-digital 
converter and saves the data in memory 
for subsequent use by other programs. 
Assume, also, that both the analog input 
routine and the programs that use its data 
are run once per second. Naturally, the 
analog read program should execute first 
so that the data used by the other pro- 


grams will be the most current readings. 
Therefore, the priority of the analog-read 
program should be set higher than that of 
the other programs. 

For programs on the same level, which 
has the highest priority? Does priority 
even exist on any individual level? For 
this executive, all programs on a given 
level have equal priority. However, a 
slight difference in their execution mode 
may be thought of as a type of priority. 
Due to the manner in which the 
executive routine XLSC scans an in- 
dividual level for task requests, the pro- 
grams listed first in the job stack will be 
executed first in most instances. Hence 
the first programs have a higher priority. 
However, this is a limited priority in that 
it does not allow a program listed first, for 
example, to interrupt a program listed 
lower. Any time a program starts to ex- 
ecute on a level, it must be complete 
before any other program is allowed to 
run. The rule then for priority on an in- 
dividual level is ‘‘first come, first served’’ 
unless the programs arrive simultaneous- 
ly; in this case, the first listed in the job 
stack will be the first to run. 

As indicated, a foreground program 
with a certain poll time should not be 
placed on the same level or on a lower 
level than a background program that is 
likely to consume more time than the 
poll interval of the foreground program. 
Otherwise, polling of the program may 
be skipped due to an overload at higher 
levels. Another point is that programs 
scheduled at frequent intervals should be 
of short duration in comparison to the in- 
terval; otherwise, the executive simply 
lacks time to execute all of the work at 
that level or lower. 


Reentrant Programming 

To obtain the highest efficiency possi- 
ble in custom user software, reentrant 
type programming should be used. To il- 
lustrate reentrancy, consider a typical 
subroutine such as a software-multiply 
algorithm, a subroutine likely to be used 
by many programs running on different 
levels. Assume that this subroutine uses 
temporary or scratch-pad storage during 
its execution to store an intermediate 
result. If the program is interrupted, and 
if the scratch pad is the working register 
of the processor, then the data will be | 
saved by the interrupt service routine and 
subsequently restored upon reentering 
the subroutine at a later time. If the 
scratch area is the processor stack, then 
that too will be restored upon return, 
with still no problem. If the scratch area is 
a particular memory location used only 


by that subroutine, then the data will be 
there when the program returns. On the 
other hand, suppose while the original 
user of the multiply routine is suspended, 
a higher level uses the same multiply 
routine. The scratch-pad memory area 
has then been destroyed as far as the 
original user is concerned; this original 
user will eventually return from the 
multiply routine with the wrong result. 
If other types of subroutines are the 
topic, more disastrous types of errors 
could occur. The above subroutine, 
which uses an absolute memory area for 
temporary storage, is not reentrant and 
could cause problems if linked to the ex- 
‘ecutive and used by multiple levels. 
Reentrancy is discussed here to aid the 
user in writing real-time programs and 
subroutines. Also, be cautioned when 
linking any existing software to the ex- 
ecutive: existing programs are probably 
_ written in a non-reentrant manner, since 
the programmer probably did not an- 
ticipate their use on a multilevel system. 
' Therefore, it is advisable to carefully ex- 
amine existing software before linking it 
_ to the executive. 
How can reentrancy always be main- 
ined when writing new software? As a 
Means for temporary saving of data on 
lost systems, the following are a few 
is that may be helpful: 


Use the working registers as much as 
possible. 

- @ Use the processor stack. 

e Use index registers to point at memory 
areas unique to the level. 

Only call a particular subroutine from 
_ one level. 

‘Disable interrupts when the data is 


as being the most efficient for almost 
microprocessors. Restricting subrou- 
tine usage is, of course, inefficient; if 
tiple levels need a particular routine, 
, Nn a separate version of that routine 

_ Must be coded for each level and called 
m that level exclusively. Disabling 
Nterrupts should be used only as a 
ast resort. A particularly poor solution, 
leaves interrupts disabled for extended 
riods of time and can adversely affect 
stem operation, particularly when high- 
ure driven devices are being 


pheral Interface 
his section will discuss the interface of 
gh-speed peripheral, such as a disk or 
sital cassette,to the real-time executive. 


In these cases, we are talking about 
devices that will generally have the 
highest priority on the system and will be 
the major user of processor time at the in- 
terrupt level. 

The first requirement of any device is 
that it be interrupt driven, using one or 
more of the ready-status signals to 
generate the interrupt. The second is that 
a reasonable amount of hardware 
sophistication exists such that software is 
not required to control every menial 
function of the device. (If it is software 
controlled, then it may still be used with 
the executive; however, the system will 
be subject to sustained periods with inter- 
rupts disabled.) 

To illustrate the general procedures the 
executive software would use in a typical 
system, the example of a simple disk 
operating system is used. (Details of the 
actual driving routines are omitted, since 
they may vary widely from one disk 
operating system to another.) Assume the 
reading of a data block from the disk; a 
generalized example of the communica- 
tion of a disk system to the executive is as 
follows: 


@ First, the user will request the high- 
level programs of the disk operating 
system to be enabled (with a particular 
control key, for example). 

@ The user enters the required informa- 
tion for the desired file through the 
keyboard. 

@ A routine of the disk system calculates 
the required sector of the beginning of 
the block on the disk, based on its 
various tables and directory informa- 
tion. This information, along with a 
buffer pointer and byte count, is in- 
itialized and ready for input. 

@ A low-level subroutine issues a seek 
command to the disk controller for the 
required track and sector. It then 
suspends its current level and returns 
program control to the executive. 

@ Upon receipt of the disk-ready inter- 
rupt signal, the connected interrupt 
service routine saves the current status 
data and points to the input buffer and 
byte count. The routine then loops on 
the ready status signal reading the re- 
quired block to the memory buffer, 
one byte at a time, until the byte count 
has decremented to zero. This is all 
done with interrupts left disabled so as 
not to miss any of the incoming data. 

@ When the block is read, the priority 
level that was using the disk is reena- 
bled, and control is given back to the 
executive (which will then scan 
priorities). 


25 


@ As priorities allow, the original 
suspended level is reentered. The new 
data in the buffer is then transferred, 
manipulated, checked, etc. The above 
is then repeated, starting with step 3, 
ant time a block is read from the 

isk. 


This description is very generalized, 
with many details left to be filled in 
depending on the particular disk system. 
The major point is that the disk will ac- 
tually be read or written at the interrupt 
level, and the operating system that in- 
itializes, sends commands, and 
manipulates the data is run at a medium 
or higher software level of the executive. 

On this type of system, events will oc- 
cur at the interrupt level for durations of 5 
to 20 ms, depending on the block size 
and data transfer rate, indicating that 
even the highest software level will be 
subject to at least this amount of timing 
delay. Also note that by assigning the 
disk-system background programs to 
high-priority levels, the return from the 
buffer input routine may be made almost 
instantly, such that sequential sectors of a 
disk may be read without having to issue 
separate commands for each sector read. 

For a high-speed, digital-cassette 
system, a similar procedure is applicable, 
except that instead of reading an entire 
record at the-interrupt level, only a single 
byte is read. Thus, during any data 
transfer, an interrupt will occur every 1 to 
2 ms, at which time the service routine 
will read or write 1 byte to or from a buf- 
fer, and return. When the entire buffer is 
finished, the background program will be 
reenabled to perform the data manipula- 
tion. 


Printer/Communications Link 

This interface is similar to the digital 
cassette: only a single byte is transferred 
at the interrupt level and the higher-level 
driver programs are run at a medium- or 
low-priority level. It is the task of the 
higher-level program to initially format 
the buffer and initialize the byte count (in 
the case of sending a message), or to 
reserve a buffer space and_initilize 
pointers (in the case of receiving a 
message). This leaves the most simple 
tasks to be performed at interrupt level — 
merely reading the device, indexing the 
buffer, saving the data, and returning. Ac- 
tually, there is no need to save full status 
when interrupted, since the processor 
will only require a few general registers to 
perform the tasks above. At the comple- 
tion of the message the original calling 
level is reenabled. Naturally, as with the 


high-speed peripherals, most of the work 
is done by the background program that 
handles the data. 


Direct Memory Access Interface 

The description of a direct memory ac- 
cess (DMA) interface completes this 
presentation. This kind of interface in- 
volves a relatively complex hardware 
configuration and therefore may not be 
as practical as the previously described 
interfaces for the home computer. 

Basically, DMA uses the external 
device to request direct access to the 
system’s address and data busses, while 
cutting off the processor from these 
busses. The device is then free to transfer 
data to or from memory, independent of 
the processor. The advantage, of course, 
is the efficiency and speed of the opera- 
tion; the drawback is the amount of hard- 
ware needed. 

The interface to the executive is very 
similar to the disk interface mentioned 
earlier and relies on the DMA controller 
to be interrupt-driven. 

A high-level program in the executive 
will initiate the procedure by requesting a 
DMA transfer (usually done with an I/O 
command). The program and level are 
then suspended, and the processor waits 
for the DMA interrupt. The DMA con- 
troller takes over completely at this point 
by commanding the device to seek the 
required data, read it, and then, one byte 
at a time, perform a DMA transfer to ex- 
ternal memory. When the entire sector 
or block is read, the DMA interrupt gives 
control back to the executive, which 
causes the original user program to be ac- 
tivated and reentered on a priority basis. 

The DMA type of operation is most ap- 
plicable for the case of the intelligent ter- 
minal type of I/O peripheral, which has 
its own separate processor to control the 
device and the DMA operation. Most effi- 
cient, this type of operation is the 
ultimate for peripheral interface to the 
executive. Although it is a very conve- 
nient feature, it is not usually necessary 
for the typical home computer system, 
where normal I/O interface methods suf- 
fice. 


Summary 

Real-time processing has many 
applications with respect to the home 
computer system. To further enhance a 
simple real-time system, a_ priority 
executive may also be implemented, giv- 
ing significantly powerful capabilities to 
the relatively small and inexpensive per- 
sonal computer system. 


ee For those users who are seeking a per- 
anent home computer, the real-time 
utive offers the ideal operating 
tem, because it allows the computer 
be Used in the normal dedicated use 
ultaneously with various monitoring 
inctions. Such a system will allow this 
ticular user to obtain the most from 
i | computer investment. 
the designer or experimenter who 
his computer strictly in the 
op, the benefits of a real-time 
tive may not be as great. However, 
ne inventive designer could un- 
loubtedly find a number of uses for the 


executive in his workshop as an ex- 
perimental or practical aid. 

The executive presented here, unfor- 
tunately, only lays the groundwork for a 
complete and practical system, and 
much work is left to the end user, who 
must interface and write his own custom 
programs to put the executive to work. 
But hopefully, this presentation will 
enlighten some owners of personal com- 
puters to the powerful capabilities these 
machines possess and will encourage 
them to implement or experiment with 
real-time executives to explore the 
capabilities of their system. m 


Multiprogramming is the ability of the 
omputer’s operating system to handle 
d execute several programs concur- 
tly. In this article, | have set out to ex- 
plain in a simple fashion the concept of 
the operating system of a computer 
andles more than one job (ie: program) 
! one time. Only the essential elements 
included in this simple model, which 
based on a typical large-scale com- 
ter’s programming environment. The 
ime general concepts are of course ap- 
icable as well to the much smaller 
emory regions of the typical personal 
computer. 

The operating system, through its 
tious control programs, keeps track of 
amount and location of available 


gions allocated to programs currently 
| Memory. As programs (ie: tasks) are 
ad into the computer, certain informa- 
associated with them is stored by 
computer. The name of the program 
id the location where the above infor- 
ation about the program is stored is 
aced on a list called the job, or task, 
ue. (See figure 1.) As memory 
Becomes available, programs to be ex- 

uted are loaded into memory (figures 
nd 2b) according to their size and ar- 
al time (ie: how long they have been 
liting). Information regarding these 
rams, such as name and location, is 
aced on the ready queue. As process- 
Continues, programs are categorized 
$ either active, ready or waiting. Only 
Ne program at a time can be active. 


emory and the specific memory: 


Multiprogramming 
Simplified 


Irwin Lahasky 


The operating system maintains a 
special memory location for each pro- 
gram in memory which contains the next 
sequential instruction (NSI) to be ex- 
ecuted for that program. This memory 
location is called the NSI cell. As a pro- 


Program Counter = xxxxxx 


Figure 1: When a program is entered into the multiprogramming com- 
puter, it is first put onto a job queue. The jobs are typically stored in the 
order they are entered, and each queue entry has all the essential infor- 
mation about the job. 


NSI 
Cells 


A 100,000 


B 200,000 
C 350,000 


Figure 2a: Enough memory is available to fit the first three programs in- 
to the ready queue so they can await execution. The next sequential in- 
struction (NSI) cell for each of the programs is initialized to the location 
of the first instruction in the corresponding program. 


29 


* Address 000,000 


Operating System Programs 


Address 99,999 * 


* Address 100,000 
Program A 
Address 199,999 * 


* Address 200,000 
Program B 
Address 349,999 * 


* Address 350,000 
Program C 
Address 399,999 * 


* Address 400,000 
Unused 
Address 449,999 * 


Figure 2b: A representation of where the programs are actually stored in 
memory. Addresses 0 thru 99,999 (in decimal notation) are used by the 


operating system of this example. 


NSI 
Cells 


A 100,000 


B 200,000 
C 350,000 


Program Counter = 100000 


Figure 3: Program A is moved into the active queue to be executed. The 
next-sequential-instruction pointer is moved from the appropriate NSI 


cell to the program counter upon activation. 


NSI 
Cells 


A 100,000 
B 200,000 
C 350,000 


Program Counter = 100002 


Figure 4: The program counter is here incremented by 2, since the first 
instruction of program A is a 2-byte instruction. This has no effect on 


the related NS! cell. 
30 


gram is loaded into memory, the address 
of the first instruction to be executed for 
that program is moved into this NSI cell. 
A special NSI register, the program 
counter, is maintained by the hardware 
containing the address of the next-se- 
quential-instruction to be executed for 
the currently active program. When a 
program becomes active, the next- 
sequential-instruction pointer is moved 
from its NSI cell to the program counter 
of the computer; this will of course be 
dynamically changing for the currently 
active program. As instructions for the 
active program are executed the value 
of the program counter is typically in- 
cremented by the length of the current 
instruction being executed to reflect the 
address of the next instruction address 
that is to be executed. When branches 
occur, the program counter is redefined 
completely. This process is repeated un- 
til the program is either completed or in- 
terrupted by an outside service request 
from a real-time clock or input/output 
(I/O) operation. If the active program 
has been completed, its memory alloca- 
tion is freed and becomes available for 
reallocation. If it was interrupted it will 
be placed on the waiting queue and its 
next-sequential-instruction pointer will 
be defined by the old program counter 
value at the time of interrupt. The 
highest priority program in the ready 
queue will be given active status, its NSI 
cell will be moved to the program 
counter, and instruction execution will 
be resumed at its NSI address. 

As I/O requests are serviced, programs 
will be moved from the waiting queue to 
the ready queue, and will be returned to 
active status when their turn comes. Ex- 
ample: Program A (100 K bytes), pro- 
gram B (150 K bytes), program C (50 K 
bytes), program D (150 K bytes) and pro- 
gram E (100 K bytes) are read into the 
computer and placed on the job queue. 
(See figure 1.) 350 K bytes of memory are 
available beginning at address location 
decimal 100,000. Addresses 0 thru 
99,999 may contain operating system 
programs. Program A is loaded into loca- 
tions 100,000 to 199,999, its NS! pointer 
is set to its first instruction to be ex- 
ecuted (address 100,000), and it is placed 
in the ready queue. Program B is loaded 
into locations 200,000 to 349,999, its NSI 
pointer is set to its first instruction ad- 
dress of 200,000, and it is placed second 
in the ready queue. Program C is loaded 
into locations 350,000 to 399,999, its NS! 
pointer is set to 350,000 and it is placed 
third in the ready queue. 50 K bytes re- 
main available in memory from ad- 


dresses 400,000 to 449,999, but this is in- 
sufficient for either of the remaining 
programs (D and E), which require 150 K 
d 100 K bytes, respectively. Therefore 
this memory will remain temporarily 
unused. (See figures 2a and 2b.) 
“If there is no entry in the active 
queue, the first program in the ready 
leue, program A, is moved to active 
status. Its NSI cell is moved to the pro- 
m counter (as shown in figure 3), and 
xecution will begin at the status ad- 
ress. Program B now becomes first on 
he ready queue and program C second. 
As the instruction at location 100,000 is 
fetched and executed, the address in the 
yrogram-counter value changes as in- 
ictions are executed. 
ssuming the first instruction is 2 
es long, the next sequential instruc- 
n to be executed becomes 100,002. 
e figure 4.) After the execution of the 
ruction at location 100,000 is com- 
ed, the instruction pointed to by the 
rogram counter (at location 100,002) is 
fetched for execution, and the program 
inter is changed to 100,004. This is 
e because the second instruction is 
o 2 bytes long. 
Processing continues in this manner 
il an interrupt in processing is en- 
ountered, such as a request to read 
a into the program from an input 
ice or a request to write data to an 
put device. In this case, time is re- 
ired to get or write the external data, 
control is transferred to another pro- 
in the following manner. For the 
e of our example, let us assume 
program A issued a read instruction 
ated at address 158,266, and that this 
ction type is 6 bytes long. The 
am counter which had been point- 
to address 158,266 will be in- 
ented by 6 to 158,272 (figure 5). An 
pt is generated by this I/O instruc- 


€ program counter contains the ad- 
where execution is to be resumed 
Program A (158,272). The program 
er is stored in program A’s NSI 
|, and program A is placed last on the 
aiting queue. The next program in the 
ady queue is moved to active status, 
NSI cell is moved to the program 
er, and processing is continued at 
(now different) address in the pro- 
im counter. (See figure 6.) 
As jobs are completed and their 
ory allocation is freed, programs 
‘ing on the job queue are loaded into 
failable locations. Their NSI cells are 
itialized and they are placed last on 
le ready queue. 


NSI 
Cells 


A 158,272 
B 200,000 


C 350,000 


Program Counter = 158,272 


Figure 5: When program A reaches memory location 158,266, a read in- 
struction is encountered that is 6 bytes long. The program counter is in- 
cremented by 6 to 158,272. While the read operation takes place, pro- 
gram A is removed from the active queue and put into the waiting 
queue. The current value of the program counter is then stored in the A 
NSI cell as shown here. 


Active NSI 
Queue Cells 


A 158,272 


B 200,000 
C 350,000 


Program Counter = 200,000 


Figure 6: The next program on the ready queue is moved to the active 
queue after an interrupt. The appropriate NS! cell is moved to the pro- 
gram counter (re)starting the program on its way. 


Waiting NSI 
Queue Cells 


A 158,272 
B 248,208 


E 350,000 


Program Counter = 158,272 


Figure 7: At this point program C has finished and has been removed 
from memory. There is not enough room for program D but there is for 
program E, which is loaded into memory and the ready queue. Program 
E’s NSI cell is set to its first instruction’s location. Program A’s read 
operation has finished and program A is again in the active queue. The 
programs will continue to shift in and out of the active status as they 
are interrupted, until the entire series, and any that are read in later, is 
completed. 


31 


Programs are loaded into memory ac- 
cording to their position on the job 
queue, their memory needs, and 
availability of core. If program C, which 
occupies 50 K bytes, finishes first, its 
memory allocation of 50 K bytes plus 
the 50 K bytes which is unused (total 100 
K bytes) is not sufficient for program D 
(which needs 150 K bytes) even though 
program D is next in line on the job 
queue. The 100 K bytes of available 
memory is sufficient for program E, so it 
is loaded into locations 350,000 thru 


449,999. Its NSI cell is then set to 
350,000 and it is placed last on the ready 
queue. (See figure 7.) 

This idea of multiprogramming has 
developed over a number of years of 
conventional computing systems, rang- 
ing from the simplicity of two interact- 
ing programs on small machines to the 
larger contexts of many jobs executing 
simultaneously on the biggest machines. 
It is an example of how creative pro- 
gramming and design of systems soft- 
ware can make a machine do more than 
what the hardware designer intended. 


ark Dahmke 


_ Multiprogramming has usually been 
‘onsidered out of reach of the average 
rsonal computer experimenter using a 
small- or medium-scale computer. Ac- 
ually, anyone with a processor above 
he level of an 8008 can operate a 
ultiprogram or multiuser system. The 
original purpose of multiprogramming 
as to allow more than one user to take 
vantage of a computer simultaneous- 
This increased the productivity of the 
machine by allowing programs to run 
vhile other programs were awaiting user 
input, access to a disk, etc. 
_ This may seem to conflict with the ad- 
tages inherent in microprocessor 
ed systems (ie: single-user systems 
and low cost). However, there are many 
instances where the ability to run more 
han one program at a time may be ad- 
vantageous. Note that the statement 
ore than one program may run at a 
ime” does not mean simultaneous ex- 
tion. That is the definition of 


fectively, | shall refer to a more well- 
n function in computers: real-time 
interrupts. Suppose we are using a 
icrocomputer to manage the environ- 
ent in a small office building. Normal- 
Yy we want to continually poll (ie: scan) 
he sensors that are distributed 


Introduction to 
Multiprogramming 


throughout the building and adjust 
heating, cooling and lights on the basis 
of temperature and time of day. Let us 
say that during normal operation, some- 
one in the building wants to change the 
temperature of an office. 

One way to do this is to have a video 
terminal and keyboard attached to the 
system that generates an interrupt when 
a keyboard request is made. Upon 
receiving the interrupt, the computer 
saves the status of the current program 
and enters or transfers control to the 
keyboard read routine. As soon as the 
user has made the desired change, the 
system loads the olds or transfers con- 
trol to the keyboard read routine. As 
soon as the user has made the desired 
change, the system loads the old status 
information and returns to the original 
program. This same interrupt technique 
could be used to design a time-shared 
system that would allow several ter- 
minals to be hooked up to a processor. 
Each terminal would generate an inter- 
rupt, and whichever program was active 
would be put in a wait state. This 
arrangement only works well for a few 
terminals, though. You can imagine 
what would happen if everyone 
happened to press a key at the same 
time. 

Figure 1 shows timing comparisons of 
several modes of operation already 
discussed. In figure 1a two independent 


33 


PROCESSOR | 


PROGRAM A 


Pom [momen [ve | renans [ve] reins [ve | memes [ie | / 
PROCESSOR 2 
ree [re eed deaths he ee eae Bl 


(a) 


MASTER 
PROGRAM A 
PROGRAM A CONTINUES PROGRAM A 
OTHER WORK 


(b) SLAVE PROGRAM initiated | SLAVE SENDS READY SIGNAL 


SLAVE 
INACTIVE OR PROGRAM A 
DOING OTHER WORK roe DROGRAM INACTIVE OR DOING OTHER WORK 


EXTERNAL INTERRUPT 
OCCURS 


(c) PROGRAM A INTERRUPT ROUTINE PROGRAM A RESUMED 
(a) OPERATING TERMINAL O OPERATING TERMINAL 7 TERMINAL 6 TERMINAL 7 TERMINAL 6 OPERATING 
SYSTEM SERVICED SYSTEM SERVICED SERVICED COMPLETED COMPLETED SYSTEM 


TIME 


Figure 1: Timing diagrams for four different system organizations. Figure 1a is a multiprocessing example using two in- 
dependent processors. Figure 1b is a multiprocessing example using two processors connected in a master-slave con- 
figuration. Figure 1c is a single processor with one level of interrupt. Figure 1d is a single processor with eight levels of 
interrupts. Each of the eight levels is activated by one of eight terminals. 


processors are shown, each doing 
something different and neither interfer- 
ing with the other. This is known as 
multiprocessing. The processors may or 
may not be sharing input/output (I/O) 
terminals or memory. 

In figure 1b two processors are shown 
in a master-slave arrangement. Perhaps 
the slave processor performs floating 
point arithmetic or some complex I/O 
function. The master processor can give 
the slave processor commands via an in- 
terrupt and continue other processing 
until the slave informs it that it has 
finished the desired operation. 

Figure 1c shows a single processor 
with an interrupt being applied. The pro- 
cessor temporarily gives control to the 
routine specified by the interrupt hard- 
ware and begins executing it. When 
complete, it returns control to the main 
program. Figure 1d shows the multiter- 
minal timeshare system. Usually the in- 
terrupt hardware contains provisions for 


daisy chaining or giving priority to the 
interrupts as they come in. Thus, if ter- 
minal 6 applies an interrupt and the pro- 
cessor is busy with terminal 7, terminal 6 
is not allowed to interrupt the processor 
until terminal 7 is finished. 

Using multiprogramming is like using 
real-time interrupts. A  multi- 
programmed system uses interrupts, but 
in a more efficient way. Imagine a sim- 
ple two-program situation. Suppose pro- 
gram A is running and no other programs 
have been started. Then a user initiates 
(ie: loads) another program called B. 
How will program B gain control of the 
system so that it might start to execute? 

The process of passing control from 
one program to the next is usually han- 
dled by an operating-system module re- 
ferred to as an interrupt call routine. Nor 
mally, to save the programmer the trou- 
ble of making sure that this routine gets 
called at regular intervals, the routine '5 
usually imbedded in many of the 1/0 


STACK A 


STACK POINTER A (low aries 


(STACK POINTER 


routines or other standard utility 
tines on a system. Note that this 
ique will in no way upset any of 
ags or registers of the routine it is 
from. 

interrupt call program will: 


termine if any other programs are 
ing to execute. 

If so, save all registers and flags on 
stack and save the address of the 
ent program’s stack pointer in a 
ial table in memory. 

oad the new program’s stack pointer 
the table, pop all registers and 
lags off the stack. 

turn to the new program. 


Dading the new stack pointer raises 
le interesting questions. If program B 
“fot yet begun, how could its 
ters have been pushed onto its 
k? Figure 2 shows the stacks of both 
fams as they would be at each step 

Previously described interrupt call 
ine. Part of the job of the routine 

litialized program B is to set up a 
y stack and stack pointer such 
@ program counter address on the 
the stack contains the entry point 


PROGRAM COUNTER 


AFTER INTERRUPT CALL 
STACK B 


SAVE LOCATION) 


STACK A 


PROGRAM COUNTER 
(HIGH BYTE) 
PROGRAM COUNTER 
(LOW BYTE) 


ALL REGISTERS 


FLAGS 


(STACK POINTER 
SAVE LOCATION) 


(2) 


STACK POINTER A 


STACK B 


PROGRAM COUNTER 
(HIGH BYTE) 
PROGRAM COUNTER 
(LOW BYTE) 


(4) 


STACK POINTER B: 


of program B. Thus, when the interrupt 
call routine reaches step 4, it will ex- 
ecute a return instruction, then pop the 
entry point address off the stack and 
begin executing program B. When the in- 
terrupt routine is called again, it will see 
that program A is waiting and will save 


1 1OKHz 


(THREE STATE) 


[> To INTERRUPT 
LINE OF 
PROCESSOR 


D FLIP-FLOP 


DISABLE 


Figure 3: Simple hardware interrupt timer set for 10 ms intervals. 


35 


TIMER RESET 
BY SOFTWARE 


PROGRAM A PROGRAM B PROGRAM A PROGRAM B 


(a) 


nae 


TIMER RESET 
BY SOFTWARE 
TIMER RESET 
BY SOFTWARE 


TIMER TIMER 1/0 TIMER TIMER. v0 vo 
INTERRUPT INTERRUPT INTERRUPT INTERRUPT INTERRUPT INTERRUPT INTERRUPT 
CALL CALL CALL 
TIMER TIMER 
NOT RESET NOT RESET 
(b) PROGRAM A PROGRAM B PROGRAM A | PROGRAM A PROGRAM B a PROGRAM B PROGRAM A 
1/0 vo 
TIMER TIMER INTERRUPT TIMER INTERRUPT TIMER vo 
INTERRUPT INTERRUPT CALL INTERRUPT INTERRUPT INTERRUPT 
TIMER TIMER CALL 
INTERRUPT INTERRUPT 


Figure 4: Interrupt timing example of figure 1 reviewed with the addition of a hardware timer. The timer may be used in 
two ways. The example in figure 4a resets the timer on each interrupt call. This allows each program to receive its full 
10 ms time slot. The example in figure 4b does not reset the timer. Therefore, a hardware interrupt occurs every 10 ms. 


all of program B’s registers and flags, 
swap stack pointers and return to pro- 
gram A at the point where it was first in- 
terrupted. 

All this activity will take place every 
time the interrupt routine is called, but 
if one of the programs gets caught in an 
infinite loop, the interrupt call routine 
may not get called. The simplest way to 
avoid this kind of problem is to add 
some hardware to provide external 
timed interrupts. As shown in figure 3, the 
interrupt timer is set to provide an inter- 
rupt every 10 ms. A reset line is provided 
in the event that the interrupt routine is 
manually called through the software 
method. The timer may be reset to give 
the program its full 10 ms. A disable line 
is provided to allow the user to turn off 
the timer for special applications (eg: 
software timing) in which the processor 
must not be interrupted. 

Figure 4 shows our previous example 
of figure 1, but with the extra hardware- 
generated interrupts added. In figure 4a 
some software interrupts are mixed in 
with the hardware interrupts. The timer 
is reset after each call to the interrupt 
routine. Figure 4b is the same except 
that the timer is not reset after each call. 


A Complete System 


There are limitless ways to develop a 
computer system that will be easy to 


36 


use. A look at the current market shows 
this to be true, perhaps even to a greater 
extent on the small systems level. | will 
not attempt to describe all possible 
variations available on a multiprogram- 
ming system, but | will try to give as 
generalized a view as possible. 

First, we must consider what is 
necessary to make a useful system. The 
following are essential: 


1. Some form of operating system that 
allows simplified user communica- 
tions (eg: BASIC, DOS, CPM). 

2. Convenient mass storage I/O (eg: 
cassette or disk). 

3, Sufficient memory to handle all pro- 
grams. 


Another consideration might be the 
internal architecture of the processor, 
but that is another level of problem. 

Figure 5 shows the memory layout of 
a typical multiprogramming system. To 
maintain a simple system, | have com- 
bined the operating system with the 
time-sharing routines that support all 
terminals, such as video displays, 
keyboards and teletypewriters. This 
means that each time the operating 
system gains control, through an inter- 
rupt call or timer interrupt, it will com- 
plete its own activity and then transfer 
control to the timesharing program for 
the remainder of the time slot. If the 


| operating system is given highest priori- 
_ ty, the response times of the terminals 
_ should not suffer. The operation of the 
_ time-share program can be treated as a 
_ multiprogram system in miniature, 

where each terminal is given a time slot, 
| or it may be designed to simply scan the 
_ terminals, choosing a new terminal each 
_ time it is given control. 


Controlling Vo 


Many programmers have discovered 
the convenience of vectoring all I/O 
through one subroutine; this simplifies 
programming greatly and makes system 
‘changes much easier. Typically, one 
‘subroutine will accept an operand, if 
‘Necessary, and an operator function 
‘code passed from the main program and 
will decide which I/O function to per- 


‘some large computer systems, the 1/O 
‘driver programs can only be accessed by 
‘ecuting a special kind of interrupt call 

hat informs the operating system that 
‘the user’s program desires to perform 
‘some kind of input or output operation. 
The operating system then takes charge, 
erforms the I/O for the program in 
estion, and returns pointers, telling 
here the input data was stored in 
femory or that the requested output 
function has been completed. 

This type of I/O handling is necessary 
ecause the I/O controllers are extreme- 
complex and are capable of perform- 

an entire I/O operation without pro- 
assor intervention. In fact, it would be 
inefficient to make the processor of 
large system perform these menial 
$s when it could be working on more 
portant programs. In microcomputer 
ems we are not normally concerned 
h the optimization of 1/O functions 
it does not hurt performance to 
ive the processor perform most of the 
Consequently, the I/O driver 
tines in the system | am describing 
vill not be considered as part of the 
erating system. They are just utility 
broutines that may be called by the 


fining the Necessary Tables 


With only two programs very few, if 
iny, tables are needed to tell the inter- 
upt routine which program was active 
tthe instant the system was interrupted 
nd which program is next in line. But 
igine a system capable of supporting 
Or more programs: some form of 


priority scheduling will be needed, as 
well as a table to hold all of the stack 
pointers of the inactive programs. 

To handle the list of programs (herein 
referred to as tasks), we must define a 
task-control table that keeps track of a 
number of pointers and descriptors. 
First, each entry will begin with the task 
number that uniquely defines each task. 
Next, we will include the priority of the 
task on an arbitrary scale of 0 to 10. It 
will then get the processor before a task 
of lower priority: 10 is highest. If two 
tasks have the same priority, the first 
one in line in the task-control table will 
get control. The task-control table must 
also keep track of the last value of the 
stack of each task and whether or not 
the task may be interrupted, as in the 
case of critical timing loops. 

Another important status byte that 
must be kept is the current activity indi- 
cator. This byte contains the task 
number of the currently active task. 
Now let us assume that we have three 
different tasks running and all have been 
initialized (ie: stored in the task-control 
table). The first task has a task number 
of 0 and a priority of 10. Generally the 
Operating system is given the task 
number 0 designation. Since the 
operating system and the time-sharing 
program that controls the user terminals 
are considered one big program in this 
example, task 0 is also the designation of 
the time-share system. Task 1 is a pro- 
gram that one of the users submitted (ie: 
initiated) from a terminal; it has a priori- 
ty of 10. Task 2 was also loaded and in- 
itiated by a user through the time-share 
terminals, and it has a priority of 10. 

Imagine that the time-share program 
calls the I/O driver program to write a 


BOOTSTRAP 
LOADER 


INTERRUPT 

ROUTINE 
TIME ~ 
SHARING 
SUPPORT 
PROGRAM 

OPERATING 

SYSTEM (0) 


USER PROGRAM | 


USER PROGRAM 2 


ALL 1/0 
ROUTINES 


STACK O 
STACK | 


STACK 2 


Figure 5: System geography of a typical multiprogramming system with 
space for the operating system and two other programs. 


37 


character out to a terminal. Since there 
could be many terminals connected to 
the system, how does the program know 
which one to write to? It would be very 
inefficient to have different routines for 
each device, but the only way that a pro- 
gram could tell the I/O driver which 
specific display to write to is for the call- 
ing program to know the physical ad- 
dress of that terminal. Passing the actual 
address of the device ruins the neatness 
of the I/O routine, though. It is more 
convenient to specify the function to be 
performed (1 = write to video display; 2 
= read keyboard; 3 = write to cassette; 
4 = read cassette). 

The solution is to have another entry 
in the task-control table called a com- 
munications control-block pointer that 
points to the location of the com- 


CURRENT 
ACTIVITY 
INDICATOR 


ACTIVE 
TASK 
NUMBER 


munications control block for the par- 
ticular task. Since each task is given its 
own block, the user may define his or 
her own functions and addresses. Thus 
each program may have its own video 
display, keyboard, cassette interface 
and disk. The communications control 
block contains a list of function 
numbers, the address of the I/O port or 
memory-mapped port, and the address 
of the I/O subroutine that will perform 
the operation. Figure 6 shows the ar- 
rangement of all tables. 


Starting and Stopping 
To initialize a new task, the user adds 
entries to the appropriate tables through 


a console command, causing a dummy 
stack and stack pointer to be created. To 


COMMUNICATIONS 


CONTROL 
BLOCK 


(ONE FOR EACH TASK 
CONTROL TABLE ENTRY) 


stack | INTERRUPT | COMMUNICATIONS 
vasa rach 


10 
FUNCTION 
CODE 


1/0 ROUTINE 
ADDRESS TO 
HANOLE THIS 
FUNCTION 


1/0 PORT OR 
MEMORY MAPPED 
AODRESS 
ASSIGNED TO 


| ee waiter 


END OF TABLE MARKER 
(HEXADECIMAL FF) 


END OF TABLE MARKER 
(HEXADECIMAL FF) 


COMMUNICATIONS 
CONTROL 
BLOCK 


COMMUNICATIONS 
CONTROL 
BLOCK 


Figure 6: Control table organization. The current activity indicator contains the task number of the active task. The 
task-control table contains the task number, task priority, last value of stack pointer, interrupt status flag (1 for yes, 0 
for no interrupts), and the pointer to the task’s communications control block. The communications control block con- 
tains the input/output (I/O) function code, address of I/O driver routine associated with the function code, and the |/O 
port or memory mapped address assigned to the task for the particular function. One entry is provided for each func- 
tion code used in the task. The owner of the task may add entries to the communications control block for specia lized 
W/O driver requirements. 


top a task, the last thing done in the 
ask is to call a subroutine that would 
move its task control table entry. This 
; equivalent to a CALL EXIT in FOR- 
RAN found on many larger systems. 


he easiest way to show how all 
ibles and pointers affect each other 
id the system is to observe them during 
short period of machine activity. As we 
gin, task 0 (the operating system and 
ne-share routines) has control, and a 
r interrupt is occurring. There are 
10 other tasks in memory: task 1 has 
jority 5 and task 2 has priority 4. 
First, as the interrupt routine is 
tered, it saves all registers and flags of 
sk 0 on stack 0 and saves the task 0 
ick pointer in the task 0 task-control 
ble entry. (See figure 7.) Next, it scans 
é task-control table for the task of 
xt highest priority, moves the new task 
imber (task 1) to the current activity in- 
sator, moves the task 1 stack pointer 
ym the task-control table to the pro- 
ssor’s stack pointer, pops all of task 
Tegisters and flags off of stack 1, and 
scutes a return, which has the effect 
popping the program counter and 
nping to that address. 
‘ask 1, while executing, encounters a 
ll to the I/O driver routine with a re- 
st for a keyboard input. (See figure 
When the I/O driver routine is 
it scans the task-control table 
ind the communication control- 
pointer entry for task 1 (ie: the 
determines which task called it 
ooking at the current activity in- 
), then scans the communication 
| block for the function number 
corresponding to the one passed 
main program. Even though the 
iputer may have five or more key- 
attached to it, the port address 
in the communication control 
gives it the address of the 
ard assigned to task 1. 
nce the keyboard read routine is a 
imon one, the address referred to in 
tommunication control block points 
subroutine located within the 
ing-system area. Note that if the 
id need for some special I/O 
ine, he could locate it in his own 
area and put the address in his 
Munication control block as 
r function code. 
tning to the example, the 
id read subroutine is called from 
driver, reads the keyboard port 


assigned to task 1, and returns to the I/O 
driver with the ASCII code. The I/O 
driver returns to the main program with 
the ASCII code in a register or memory 
location. In figure 9 the next timer inter- 
rupt has occurred, so control returns to 
the interrupt-handler routine. Again, the 
interrupt routine saves all registers and 
flags of task 1 on stack 1, looks at the 
current activity indicator to see which 
program was last active, saves the stack 
pointer in the task 1 task-control table 
entry, scans the task-control table for 
the next highest priority task, and finds 


= TRANSFER OF CONTROL 
= DATA OR POINTERS 


TASK O 
TIME SHARE 


BOOTSTRAP 
LOADER 


ROUTINE 


then transfers control to task 1. 


= TRANSFER OF CONTROL 
ne ee DATA OR POINTERS 


INTERRUPT 


ROUTINE 


COMMUNICATIONS 


- 
CONTROL BLOCK O4—-~ 
= - 


Figure 8: Task 1 has requested keyboard input from its assigned 
keyboard. When the input is completed, the I/O driver returns control 


to task 1. 


INTERRUPT 


COMMUNICATIONS 
CONTROL BLOCK | 


Figure 7: Task 0 has control of the processor and has just been inter- 
rupted. The interrupt routine looks at all pointers, saves the status, and 


—_— * TRANSFER OF CONTROL 
—<<---- = DATA OR POINTERS 


INTERRUPT 


COMMUNICATIONS 
CONTROL BLOCK O 


COMMUNICATIONS 
CONTROL BLOCK I 


COMMUNICATIONS 
CONTROL BLOCK 2 


Figure 9: Task 1 has been interrupted and turns control over to the inter- 
rupt routine. Control is then passed to task 2. 


——— * TRANSFER OF CONTROL 
———— © DATA OR POINTERS 


BOOTSTRAP 
LOADER 


CURRENT J 
ACTIVITY Ne rate: 
INDICATOR aie 


INTERRUPT 
ROUTINE 


COMMUNICATIONS 
CONTROL BLOCK 


TERMINATOR 


ea 


COMMUNICATIONS. 
CONTROL BLOCK 
ONE 


COMMUNICATIONS 
CONTROL BLOCK 2 


Figure 10. Task 2 has completed its execution and encounters a CALL 
EXIT. Control is given to the terminator routine which performs some 
cleanup operations and removes the task 2 entry from the task-control 
table, effectively destroying the task. Control is then given to the inter- 
rupt routine which again scans the task-control table to find the next 
task awaiting execution. 


that task 2 should get control. The stack 
pointer for task 2 is loaded from the 
task-control table, all registers and flags 
are popped off of stack 2 and again a 
return is executed that causes task 2 to 
take control. 

In the next step (shown in figure 10), 
task 2 has encountered the equivalent of 
a CALL EXIT or STOP command and has 
finished processing. This CALL EXIT 
calls a terminator routine which again 
finds out via the current activity in- 
dicator who called it and simply 
eradicates the task-control table entry 
for that task. To keep things neat, all 
succeeding table entries are moved up 
one notch. Then, control is returned to 
the interrupt handler, which will find the 
next task in line. In this case, since no 
other tasks of lower priority are waiting, 
control is returned to the highest priority 
task 0. 


Error Handling 


On a single program system, error 
handling is something that the user can 
watch for manually. When several pro- 
grams are running, the system must have 
routines to handle errors rapidly so that 
other programs will not be slowed down 
or destroyed. There are many common 
errors that are relatively easy to deal 
with. Executing an invalid op code or 
forgetting to put in the second or third 
byte of a multibyte op code can be 
handled through a simple system restart 
(through the interrupt-handler routine) 
without losing continuity. But what 
about a program loop that accidentally 
destroys part or all of another user’s pro- 
gram? On an IBM 360, all memory 
blocks assigned to a task are given a 
unique 4-bit protect key, which is the 
same as the task number. This key is 
stored in external hardware. 

One approach might involve having 
two external 16-bit registers that could 
be loaded by the interrupt routine with 
the high and low memory addresses of 
the active task. Every time the address 
bus has a valid address on it, it is tested 
against these registers. However, special 
precautions would have to be taken !n 
those cases in which a utility in low 
memory (ie: I/O driver routine, etc) is 
called, or when memory-mapped |/ 
ports outside these address limits até 
used. 


Resolving Allocation Conflicts 


Allocating I/O devices has been a 
problem since the early days of com 


puters. Devices like tape drives and card 
readers (ie: sequential devices) are non- 
shareable: only one program may use 
_ them at a time. However, disk drives are 
_ considered shareable, since the head 
_ may be positioned at random to gather 
data. The simplest method that can be 
applied to the system described in this 
rticle would be to have the initiator 
program check all communication con- 
rol blocks to make sure that certain 
‘devices are not assigned more than 


As mentioned earlier, I/O techniques 
_ in use on small systems leave all control 
up to the processor. If special timing is 
“needed or if strobes or ready flags have 
to be checked, software is used instead 
of extra hardware, as in the case of 
larger systems. This in itself is good from 
‘the standpoint of economy, but requires 
at special care be taken when writing 
the driver and controller software. 

For example, suppose a cassette read 
Outine uses a universal asynchronous 
teceiver transmitter (UART) im- 
plemented in software as an algorithm 
tead of hardware. In a nonmultitask- 
ig system, the program may simply 
op and time down between bits, but in 
multitask system the timer interrupt 
would surely halt the activity and ex- 
cute other programs. It may be well 
er 30 ms before it can return to the 
assette read routine. It is easy to see 
hat can happen to critical timing loops 
N a system that uses any kind of inter- 
rupts. 

| The solution? If you must do the 
ical timing in software, it is necessary 
turn off the interrupt timer while in 
critical loop and reactivate it when 
N noncritical parts of the routine. If ex- 
nal hardware is used, and internal 
ing is reduced to noncritical loops, 
intervention of the multitask inter- 
upt timer will not normally affect the 
iystem. If the interrupt timer causes an 
iterrupt just before a byte is received 
Ny the UART and returning in time for 
ine next byte to be received, the easiest 
yay to assure that the cassette read 
Outine does not drop a byte is to set the 
timing of the interrupt oscillator to at 
ast twice as fast as the transmission 
ite of the UART. This greatly reduces 
ances of losing a byte. 

An alternate approach is to have even 
Ore hardware that forces the interrupt 
t to timeout and return control to 
ne program awaiting the data transfer 


Operation when the incoming data is 
present. A third way involves the use of 
direct memory access (DMA) capability, 
in which the external controller reads 
the UART and deposits the data directly 
into memory. With this approach, the 
calling program need only initialize the 
external registers and go into a wait 
state until the transfer is complete, 
allowing the rest of the tasks to execute 
normally. This last approach is used on 
many large systems and constitutes 
what is called a channel. 


Managing the System 


As you can see, many levels of activi- 
ty are required to control a 
multiprogramming system properly. It is 
also apparent that some minimal hard- 
ware is required to prevent one user 
from obtaining exclusive control of the 
processor or writing over someone else’s 
program or data. The use of control 
tables and a standard interrupt routine 
are also important as a way of letting the 
interrupt routines and I/O drivers know 
which task had control of the processor 
last. 

If the user plans to run BASIC soft- 
ware or some other kind of language in- 
terpreter, the safety features discussed 
earlier may be implemented as part of 
the interpreter. To run a lower-level 
operating system that allows the user to 
generate assembler-level code will 
generally require the hardware de- 
scribed in this article, thus safeguarding 
the system and its users from accidental 
loss of programs or data. In general, the 
use of timed interrupts allows for a fairly 
even distribution of processor activity, 
and depending on the cycle time of the 
host system, between four and twelve 
tasks may be handled without too 
noticeable a delay in response time.™ 


REFERENCES 


1. Abrams, Marshall D, and Philip G Stein. Com- 
puter Hardware and Software. Addison- 
Wesley, Reading MA, 1973. 


2. Davis, William S. Operating Systems. 
Addison-Wesley, Reading MA, 1977. 


3. Martin, Donald P. Microcomputer Design. 
Martin Research Ltd, Northbrook IL, 1976. 


4. Signetics Data Manual. Signetics Corporation, 
Sunnyvale CA, 1976. 


5. Struble, George W. Assembler Language Pro- 
gramming: The IBM System 360 370, second 
edition. Addison-Wesley, Reading MA, 1975. 


6. Tanenbaum, Andrew S. Structured Computer 
Organization. Prentice-Hall, Englewood Cliffs 
NJ, 1976. 


41 


fark Dahmke 


My personal involvement with 
nicroprocessors began in 1975 when | 
structed an 8008-based microcom- 
ter. Over the years, | have added toa 
dartin Research 8080 board so that | 
ow have 32 K bytes of memory, dual 
oppy disks, a modem, two serial ports, 
nd 30A_ power supply— all with an 
sortment of S-100, Martin Research, 
custom wirewrap boards. The 
m works as is, but it has fallen prey 
the troubles of any system that has 
frown haphazardly to its limit. 

e S-100 interface works properly 


tremely careful when adding any new 
ards as they may not run on my bus. I 
lso can’t run interrupts, direct memory 
scess, (DMA) or dynamic memory 
ids. Since | now have the money to ex- 
nd, | could go out and buy a new 
ainframe and move all the S-100 cards 

sr to it. However, then | lose the Mar- 
n Research boards which include both 
| ports, the keyboard interface, 8 K 
s of memory, and a PROM board. | 
uld sell them and convert to S-100, 
it why get rid of perfectly good hard- 


e answer to my dilemma _ is 
itiprocessing. Multiprocessing means 
ining two or more processors in either 
loosely or tightly coupled configura- 
mn. Loose coupling means that the 
chines are physically tied together 
a machine-to-machine input/output 
ports, sharing the work load on a 
‘am-by-programi basis. Figure 1 
S such a system, where each pro- 
r has its own main memory but 


ntroduction to 
ultiprocessing 


shares the disk drives. On large 
machines programs are generally read in 
on cards (or from a terminal) and kept on 
disk in a queue (ie: waiting line) until the 
processor is able to work on them. Both 
processors can read jobs from the queue 
and begin executing them. As shown 
here, job A has requested to be run on 
processor 0, and so has job B. Job C 
wants to be run on processor 1, but job 
D has no preference. In the last case, job 
Dwill run on the processor that gets to it 
first. 

Once in a given processor, the job re- 
mains there. From the user’s standpoint, 
the system will appear to run twice as 
fast as a single machine. Note that the 
individual jobs are not running faster, 
but since two jobs may run at the same 
time, more work is done. A commonly 
used term for this arrangement is shared 
spooling or shared queueing. The 
primary advantage over two indepen- 


MAIN PROCESSOR 
MEMORY ) 


PROCESSOR MAIN 
1 MEMORY 


JOB QUEUE ON DISK 


0 

° 

1 
EITHER 


Figure 1: A loosely coupled multiprocessor. Each processor has its 
own non-shared main memory, but both processors share the disk 
drives. 


43 


dent processors is that both halves of 
the multiprocessor can access the same 
files on disk. 


Tight Coupling 


Tight coupling involves extra hard- 
ware that allows both processors to ac- 
cess the same main memory. Figure 2 
shows a typical tightly coupled 
multiprocessor. The memory can be 
shared in a number of ways. First, the ex- 
pensive and difficult-to-obtain multiport 
memory chips can be used. These 
memory chips have two sets of all ad- 
dress, data, and control lines. Both pro- 
cessors can access the same byte at the 
same time. In a large installation, 
multiprocessors are used to split the 
workload on a time-slice basis within a 
program. Hence a user’s program may 
bounce back and forth between pro- 
cessors as the system attempts to run 
the two highest priority tasks (ie: pro- 
grams) at all times. 

The second approach involves direct 
memory access. Many microcomputers 
allow DMA, but very few people use it. 
Consider what happens on a typical 
microcomputer when two processors 
are on the bus and one begins a memory 
access. If it gains control of the bus, the 
other processor must be in a hold state, 
and vice versa. DMA may be useful for 
some I/O data transfers (ie: disk or tape 
at high speeds) where the micro- 
processor cannot keep up, oF in 
graphics display hardware that accesses 
external memory, but DMA serves no 
useful purpose in a multiprocessor con- 
figuration. What good is it to have two 
processors when only one can perform 
at any instant? 

On a 6800 microprocessor, one can 
complement the clocks and run two pro- 
cessors on the same bus without con- 


PROCESSOR 


SHARED 
MAIN 1 
MEMORY 


flicts, however | have not seen it done 
commercially. 


Shared Memory Blocks — 
a Compromise 


Perhaps direct memory access isn’t 
the answer, but it would be convenient 
to have shared memory which both 
machines can access. Several minicom- 
puters on the market already use this 
scheme. In figure 3, both processor 0 
and processor 1 have 64 K address 
spaces, but each one has its own area 
from the 8 K boundary up to the top. 
The bottom 8 K of memory is either built 
with multiple-port access technology or 
resolves references by sending wait 
states out to the other processor while 
servicing a request. This way, the DMA 
access is cut to a minimum because a 
given processor will only be held up for 
a small percentage of the time. 


Nonsymmetrical Machines 


Until now, | have considered only 
those machines that are symmetrical; 
that is, capable of executing identical 
programs on either processor. Other 
possibilities do exist. Most hobbyists try 
to keep up with the latest technological 
advances by playing musical hardware 
—buying a Z80 to replace the 8080, 
which replaced the 8008, and so on. In 
my case, | have wanted to upgrade to a 
Z80 for some time, but am reluctant to 
change my existing system. | plan to 
leave the byte-oriented devices (ie: 
keyboard, video board, modem, etc) on 
the 8080 Martin Research system and 
move the ICOM floppy-disk controller 
and 24 K bytes of memory over to the 
780. The two systems will operate in a 
loosely coupled environment with 
shared 1/O as a compromise between 
fully loose and tight coupling. This in- 
volves running two full handshaking 
parallel ports between the two pro- 
cessors.(See figure 4.) In this arrange- 
ment, the old processor acts more as 2 
slave than as a second master. One of 
the advantages of this configuration is 
that the disk drives are on one system 
and the other slow I/O devices are on 
the other, giving the effect of ovel- 
lapped processing. Most floppy-disk in- 
terfaces are processor intensive; they de- 
mand exclusive use of the processor un 
til the input or output operation is com: 
plete. In some cases, the disk may be 


tied up for as much as a minute at 4 


Figure 2: A tightly coupled multiprocessor. Both processors share the 
time, forcing the programmer to wait. 


same main memory and also share disk storage. 


44 


In the consulting business, personnel 
time is a far more valuable commodity 
than machine time. Having a slave |/O 
rocessor attached to the large machine 
is practical because the user can con- 
_ tinue typing while waiting for the disk or 
y other time-consuming operation to 
nish. 
__ When considering the loosely cou- 
led master-master or master-slave ar- 


each other at a rather high rate. It is 
assumed that the majority of the inter- 
‘processor communication will be 1/O 
‘related —that is, the master processor 
will load its I/O burden onto the slave 
‘processor. IBM calls the slave pro- 
“cessors channels; some other manufac- 
‘turers call them peripheral processors. 
enerally, a peripheral processor is 
omewhat more intelligent than a chan- 
el, 

| There is a great deal of ambiguity in 
‘the terminology. No clear-cut defini- 
ons have been made —or are likely to 
made in the near future. Depending 
on the sophistication of the controlling 
software, an auxiliary processor might 
emble a channel, a slave processor, 
an attached processor (ie: a second 
master processor that is oblivious to the 
Outside world) capable of executing the 
e instructions as the master pro- 
sor. 


rupts 


“In microcomputers, there are several 
ys of alerting the processor that a 
levice needs to be serviced. The most 
mon method is that of priority- 
dased interrupts. In this case, hardware 
§ built into the processor to halt activity 
command and transfer control to a 
designated service subroutine. Later, 
ontrol is returned to the original pro- 
tam, 
- Another method is to simply poll each 
ice (ie: check a status bit to see if 
hing needs service). Polling requires 
essor time and implies that the pro- 
m has the necessary machine code to 
dle the polling and that the code will 
@ executed periodically. 
It becomes obvious that interrupts are 
impler to handle, but there are other 
Omplications. Most disk-drive con- 
llers cannot tolerate interrupts. If you 


Mpossible to recover properly. There 
ire several other reasons for not running 


Hardware for a Loosely Coupled 
Multiprocessor 


Loose coupling is relatively easy —all 
that is needed is a set of parallel or serial 
ports with handshaking, one going in 
each direction. (See figure 4.) This will 
allow a single path for byte or block 
transfers between machines. On my 
system, however, | would like to hook 
several devices to the slave. It would be 
wasteful to run a parallel port for each 
device, since the processor can only 
read or write to one at a time. So why 
not send along a device code or device 


PROCESSOR 
0 


Figure 3: A tightly coupled multiprocessor with partially shared 
memory. Each processor has its own memory (up to 56 K in this exam- 
ple) but both share the bottom 8 K. Special hardware must be provid- 
ed to resolve the conflicts in bus usage when accessing the bottom 8K 
block. Typically, if both processors try to access the 8 K block at the 
same time, the hardware will send out wait states to one processor 


while servicing the other. 


PROCESSOR 0 


(280) 
MASTER 


PROCESSOR 1 


(8080) 
SLAVE 


Figure 4: Master-slave configuration to be used in the author’s system. 
Processor 0 owns the disk drives, and processor 1 owns all other I/O 
devices. The two parallel ports between processors 0 and 1 are for I/O 
communications. Processor 1 will be acting as a slave I/O processor, 


and will receive its commands and data from processor 0. 


45 


address in parallel with the byte being 
sent across? (See figure 5.) That way, the 
slave can poll the port and can use the 
device code to determine what to do 
with the byte of data. This approach is 
very similar to a common technique 
called time-slice multiplexing, where the 
processor's attention is divided equally 
among requesting devices. For example, 
device code 0 may be used for keyboard 
input, device 1 for console output, 
device 2 for a modem (ie: dial-up), and 
so on, 

In order to run interrupts, the same 
bus structure can be used, but the data- 
ready line (in figure 5) will generate an 
interrupt. The interrupt routine will read 
the device code and determine which 
routine to go to for processing the in- 
coming data. 


PARITY BIT 


SLAVE 
{Fettbsen} 


MASTER 
PROCESSOR 


PROTOCOL ERROR 


ONE FULL HANDSHAKING PARALLEL PORT 
(OTHER PORT TO GO FROM SLAVE TO MASTER) 


Figure 5: Anatomy of the parallel port. Along with the 8 bit data bus, a 
4 to 6 bit device address is sent. This allows the bus to be time- 
multiplexed so that many devices may share the port. 


DEVICE 
CODE LINES 
ATA. 
EADY 


ACKNOWLEDGE 


Figure 6: Port timing characteristics. (a) shows the byte and device 
code timing. (b) shows the signal timing of (a). When data ready 
becomes valid, the destination processor can read the data and do the 
error checking. When it has done this, it will send back an 
acknowledge, which will release the port for another transfer. 


Software for a Loosely Coupled 
Multiprocessor 


After years of software development, 
| have learned that there is both a quick 
and dirty way to do something, and a 
clean and organized way. It is always 
easier to write quick and dirty software 
because it is initially the least’ time- 
consuming. But in the long run, it is far 
more difficult to modify. Clean code re- 
quires some additional thought and 
planning but saves a lot of effort when 
making changes. For most purposes, it 
seems that elegance and flexibility go 
hand in hand. In this section, | will 
outline and discuss an elegant approach 
to the I/O slave multiprocessor con- 
figuration previously described. 

First, | assume that the slave pro- 
cessor has many input and output 
devices attached to it and that the 
master will use the two parallel ports to 
communicate with all of these devices. 
The slave will have to have subroutines 
in order to work with each device. 
Another assumption | am going to make 
involves time multiplexing. Figure 6a 
shows how the port may be multiplexed, 
with the device code changing as suc- 
cessive bytes go to different devices. 
The unit time interval for a byte going 
out on the port is defined by the dura- 
tion of the handshaking sequence. 
Figure 6b shows the timing of signals on 
the port. Note that the acknowledge line 
releases the port for another transmis- 
sion. With this configuration, bytes 
destined for many different devices may 
be mixed on a byte-by-byte basis. How 
are they separated on the receiving side? 
Now programming and design phil- 
osophy become important. The simple 
solution would be to write a top-down 
modular structure. For most purposes 
this would be quite sufficient. But what 
would happen if we had a multiple byte 
protocol for one or more of the devices? 
Perhaps the console display routine has 
provisions for an addressable cursor and 
requires 2 bytes of data in succession 
after the command byte is sent (com- 
mand, line number, column number). 
Can the next 2 bytes simply be read from 
the port? Remember that the port is be- 
ing multiplexed; the next 2 bytes might 
have a different device code. The top- 
down technique may work well for a 
nonmultiplexed port, but runs into real 
problems when the port is multiplexed. 


Coroutines 


! would like to introduce a new con- 


cept before going on. Most people are 
familiar with subroutines —whether in 
FORTRAN, BASIC, Pascal, or assembly 
anguage, subroutines all do the same 
thing: they allow frequently used seg- 
ments of a program to be referred to 
without duplicating the segment each 
ie it is needed in the program. 

In figure 7, subroutine A is called 
twice, but after the return instruction is 
executed in subroutine A, control is 
assed to the instruction immediately 
fter the last instruction executed in the 
ain program (namely, the call A in- 
ir ction). No matter where subroutine 
is called, the return instruction reloads 
he old address and continues where the 
lain program left off. 

Coroutines are similar to subroutines, 
ut they do not return to the old address 
/in a subroutine. In figure 8, control 
pears to be oscillating (ie: flipping 
ack and forth) between programs A and 
Whenever the coroutine is called, it 
turns to a point just after the last exe- 
ited instruction of the other program. 
lis in effect is a crude form of time- 
laring. A more general form of this 
echanism is called multitasking or 
ultiprogramming. A thorough discus- 
on of multiprogramming can be found 
“Introduction to Multiprogramming,” 
(© in this book. Multiprogramming 
us a clean mechanism for handling 
ime-multiplexed port. A main pro- 
im or task is set up for each device. A 
heralized coroutine call is written that 
a table of tasks and where each one 
t off when it encountered a coroutine 
The supervisor program, which in- 
ides the coroutine and task-switching 
tines, does the polling and, upon 

eption of a data-ready flag, will read 

! device code and look through the 

lice assignment table. This table in- 
ates which task should receive the 

of incoming data. The supervisor 

Ctivates the specified task by loading 

the task’s old stack pointer and pop- 
g all registers (which are saved when 

‘Coroutine was called) off the stack. 

Teactivated task then reads the byte 

Performs whatever functions are re- 

. This design is useful because it 

ws multiple threads of program logic 

Wut mixing them unnecessarily. 

h task can be written independently 

@ others, without any register usage 

cts. 


ss ol Blocks and Tables 
arlier | mentioned that several look- 
tables are used to determine which 


task should be given control. The first 
control table is called the channel-status 
table or CHST. (See figure 9a.) Entries in 
the CHST have a byte for the device 
code, a task-assignment byte, and a 
status byte. If the device codes are put 
in ascending order, the device-code byte 
may be eliminated, saving a few bytes. 
The status byte includes a bit indicating 
whether the device code channel has 
been opened (ie: ready for service) or 
not. If it hasn’t and someone tries to use 
it, a device error will be generated. 
Another bit in the status byte indicates 
whether the device code is defined or 
not. This is analogous to the open/close 
bit, but defines whether or not the 
device exists; if this bit is cleared, the 


MAIN 
PROGRAM SUBROUTINE A 
SUBROUTINE 
A CALLED 

RETURN 
SUBROUTINE 
A CALLED 


Figure 7: Flow of control when using subroutines. When subroutine A 
is called, the hardware in the processor must save the current value of 
the program counter so it can be restored when the subroutine 
finishes. Thus a subroutine can be called from any point in a program 
and will return to the instruction immediately after the subroutine call 
instruction. 


PROGRAM A PROGRAM B 


COROUTINE B 


CALLED COROUTINE A 


CALLED 


COROUTINE B 
CALLED 


COROUTINE B 
CALLED COROUTINE A 
CALLED 


Figure 8: Flow of control when using coroutines. When coroutine B is 
called, control is passed to B. In B, when coroutine A is called, control 
is passed back to A to the instruction just after the first coroutine call 
instruction. Hence control appears to be flip-flopping between A and 
B. This is representative of a crude form of timesharing or multitask- 
ing. 


47 


device cannot be opened. Two other bits 
in the status byte define whether the 
device is to be used for input, output, or 
both. 


The Task I/O Table 


Another useful innovation is the task 
/O table or TIOT. (See figure 9b.) Each 
task or program must have its own TIOT. 
The task control table (TCT), discussed in 
the previously referenced multiprogram- 
ming article, contains an entry for each 
task, which includes a pointer to the 
TIOT for that task. The subroutines that 
manage the port communications must 
check the TCT to determine which pro- 
gram called it and the location of the 
the task’s I/O table. The TIOT has one 
entry for each device the program 
wishes to use. The first byte in a TIOT 
entry is the unit number that the pro- 
gram will use to refer to the device (ie: 
0=console, 1=keyboard, 2=printer, 
etc). The second byte contains the ac- 


CHST 


TASK 
DEVICE | TASK 
CODE ASSIGNMENT | STATUS 


STATUS BYTE: 
BIT DESCRIPTION 
0 OPEN /CLOSED 
1 OEFINED / UNDEFINED 
2 DEVICE WILL ALLOW DATA /WPUT 
3 DEVICE WILL ALLOW DATA OUTPUT 
4 CPU NUMBER (0 OR 1) 


Figure 9a: The channel status table (CHST). Each table entry has a 
device code, a task assignment byte (hexadecimal FF, if unassigned), 
and a status byte. The table is terminated by an FFH in the first byte of 
the table entry after the last valid entry. 


4) 


TIOT 


NOTE: ONE TIOT IS REQUIRED 
FOR EACH TASK. 


Figure 9b: Shows the task I/O table (TIOT). One TIOT must exist for 
each task in either processor. A TIOT entry consists of a unit number 
followed by the assigned device address. Using this scheme, device 
reassignments are easy to handle. 


tual device code. This allows the user 
program to reassign a device by chang- 
ing the TIOT entry. For example, you 
may wish to have console output routed 
to the printer or have keyboard input 
come from a disk file. This also allows 
alternate consoles to be defined. 


Shareable Devices 


Some devices must be restricted so 
that no other task can send or receive 
data from them. The channel-status 
table (CHST) has a byte reserved for task 
assignment. When a channel (ie: device 
code) is opened by a task, the CHST 
must be updated to keep track of which 
task opened it. Later, when the task tries 
to read or write to a device, the CHST is 
checked to make sure it is legal. Without 
this checking process, unusual events 
may occur. For example, suppose two 
tasks tried to use the same printer? The 
printout would be a mixture of 
characters from each task. When the 
channel is closed, the task assignment 
byte of the CHST entry is set to hexa- 
decimal FF to indicate that the device 
address is unassigned. 


Opening and Closing Devices 


Before any data can be transferred on 
a channel, it must be opened. This pro- 
cedure is similar to opening a tape or 
disk file. The reason for doing this is to 
explicitly indicate that you wish to start 
reading or writing. The procedure also 
has a use when accessing tape or disk. 
When a device is opened that is assigned 
to a mass storage file device, the file 
name may be given instead of a device 
code. The supervisor should recognize 
this and establish a device-code link to 
the tape or disk I/O routine. 


Slave-to-Master Interface 


We now have a straightforward 
mechanism for handling the time- 
multiplexed port going from the master 
processor to the slave. Since there must 
also be slave to master communications, 
we could run multitasking on both pro- 
cessors. But in my system | only want to 
run one program on the master— either 
the operating system, the assembler, the 
editor, or an application program. The 
problem here is similar to the one 
discussed earlier: without multiple 
threads of program logic, how can one 
handle a multiplexed bus? 

One approach would be to have oné 
master channel (ie:port) I/O subroutine 


_ that is called for all input or output re- 
: quests. The registers would have to be 
_ set up before calling the channel to in- 
_ dicate what function (eg: read, write, 
_ open, close) is to be performed. Another 
_ register would indicate the unit number 
‘© operate on, which is found in the task 
_ 1/0 table. Once the subroutine has been 
called, it will have to perform many 
_ things: 
___ If the operation to be performed is an 
_ output-to-port: 


(1) Check for an acknowledge. 

(2)Check for errors. 

(3) Output the device code. 

(4) Output the data byte. (This should 
automatically set the data-ready 
line.) 

(5) Wait for an acknowledge. 

(6)Check for errors on the port. 


TASK STATUS 
NUMBER | BYTE 


CHIO (CHANNEL 1/0 
SERVICE SUBROUTINE) 


CHST (CHANNEL STATUS 


|| 5 


ach task has (among other things) a pointer to its TIOT. The CHIO (channel input/output) subroutine may be called 


TCT (TASK CONTROL TABLE) 


STACK 
ONY [POINTER | TIOT 
appr. | Sar 


(INTER-TASK DATA BUFFER) 


(7) If an error has occurred less than 
five times, go to (1) to retry; other- 
wise, issue a write error. 

(8) Return to calling program. 


If the operation is an input, the prob- 
lem becomes more complex: 


(1)Check for data ready. 

(2) Read in the device code. 

(3)Read in the data and check for 
Parity errors. 

(4) If an error has occurred less than 
five times, issue a parity error, and 
go to (1) to retry; otherwise, issue a 
hard 1/O error. 

(5) Return to calling program. 


At this point there may be difficulties. 


Suppose the device code of the byte is 
not the one desired? One alternative is 


TIOT 


E POINTER ° 
REA 


(TASK INPUT/ 
OUTPUT TABLE) 


) contains entries for each task in a given machine. 


TASK INPUT/ OUTPUT 
TASK CONTROL TABLE (TCT) TABLE (TIOT) FOR TASK 1 


USER TASK 1 CHANNEL STATUS TABLE (CHST) 
TASK sTaTuS 
one ) ASSIGNMENT |(BINARY) 


SET UNIT#3 


CHANNEL 1/0 SERVICE SUBROUTINE (CHIO) 


STATUS 10111 MEANS: 


(1) DEVICE BELONGS TO 
PROCESSOR 1 


met 
(1) DEVICE 1S OUTPUT 
(0) Ha 1S NOT INPUT 

ET TIOT ADDR (1) DEVICE IS DEFINED 

6 1OT ADDRESS Pf | (1) DEVICE 1S OPENED 
FIND UNIT 3 IN TIOT 

GET DEVICE nar od 

CHECK CHST 

CHECK FOR VALID TASK NUMBER 


CHECK FOR VALID STATUS BITS | 


1S DEVICE Gounecred TO MASTER 


OR SLAVE (0 OR 1) ie, 


1F MASTER 


PUT DEVICE CODE AND DATA 
INTO INTER-TASK DATA BUFFER 


CHIO: 


THEN RETURN 


ELSE 
OUTPUT DATA TO PORT 
CHECK FOR ERRORS 

THEN RETURN 


ENDIF 


Figure 11: Shown here are the same tables described in figure 10, with the addition of user task 1. This task has load- 
ed the registers and called the channel I/O routine, which will see that task 1 wishes to output a byte to unit 3. First, 
the routine gets the address of task 1’s task I/O table by looking in the task control table (the task status byte will tell 
the channel I/O routines which task is active — in this case, task 1). Next, the routine scans the task I/O table to find 
the unit-device address assignment for unit 3. In this example, the device address for unit 3 is found to be hexa- 
decimal 1C. The routine then scans the channel status table for device 1C. When found, the task assignment byte is 
checked and verified against the number found previously in the task control table. If they aren’t the same, an in- 
valid device code message is generated, and control is returned to the calling task. If they are the same, the channel 
I/O routine verifies that the device is defined and opened. Lastly, the routine must check the processor number bit to 
see which processor the device is on, if the device is on the other processor, the data and device code are sent to the 
port. If the data is to go to a device on this processor the data and device code are put in the inter-task data buffer 
for the destination task to read. 


to set up a memory buffer as temporary 
jtorage for the bytes that come in but 
are not needed yet. The other is to force 
he slave to send only what is wanted 
and let it do the buffering. In the second 
e, an input operation, a device ad- 
dress would be reserved for processor- 
to-processor commands. Whenever the 
naster wants something from the slave, 
it must issue a request for it on device 0. 
his makes the slave do the work, and 
astically reduces the overhead of the 
naster processor. 


istributing the Workload 


_ As the word is defined, slaves are sup- 
josed to do the menial tasks. What else 
an we make it do? Parallel processing is 
| possibility. Perhaps it could be made 
D calculate equations that the master 
leeds at a later time. This type of pro- 
sssing requires tricky programming, 
ut can be done. 


ding and Starting a New Task 


suming that there is a program to 
tun on the slave, there must be a 
anner in which to enter it into the 
ive and start it. First, either the slave 
lust have sufficient unused memory (at 
nown address) in which to put the 
ram, or the program must be in a 
locatable format that can be loaded at 
ty address and run correctly. Several 
semblers and compilers are able to 
Oduce relocatable object code that 
in be translated into absolute 
dresses and loaded with a small 
Jader program. Second, the master 
ust have some way of adding the new 
sk to the slave’s task-control table 
ST) to initiate execution. The tech- 
ques involved are quite system depen- 
int and will not be discussed here. 
e the task is running, it may receive 
it from the master through an as- 
ed input-device address defined in 
} task 1/O table. Note that most of this 
cussion on distributing the work load 
$0 applies to a_ tightly-coupled 
ultiprocessor. 


iting Devices on a Multiprocessor 


Since both processors may have 1/O 
ices, it is necessary to tell each one 
it has and what the other processor 
. An extra bit in the status byte of 
ich channel-status table entry could be 

ed to indicate what devices are 
e. For example, the slave may need 


to read bytes or blocks from the disk, 
which is located on the master- 
processor bus. Also, it is quite possible 
that a task may wish to communicate 
directly with another task on the same 
Processor. The solution is to set up a 
small buffer that will be referred to as 
an intertask data buffer (ITDB). This buf- 
fer contains two bytes, a device code 
and a data byte. If the device code is set 
to hexadecimal FF, then the buffer is 
seen as being empty. When a valid 
device code is placed in the intertask 
data buffer and a data byte is loaded in- 
to the second location, the transfer re- 
quest is said to be posted. 

The standard channel 1/O subroutine 
may be called to perform the post, 
because it will see that the desired 
device code is on the same machine. As 
with any other I/O operation, the post 
will be performed and the supervisor 
will again make a coroutine call to 
switch to another task. An I/O request is 
always seen as a top priority task, so the 
supervisor will look in the channel- 
status table (CHST) to find out which 
task must be activated, and will transfer 
control to it. When the task gains con- 
trol, it will be able to call the channel 
/O service routine and ask for the in- 
coming byte. The service routine will 
again recognize that the byte came from 
an internal task, not the port, and will 
look in the inter-task data buffer (ITDB). 
The channel I/O service subroutine must 
of course have the necessary sequence 
of instructions to handle this extra buf- 
fer. Figure 10,shows the entire set of con- 
trol blocks and service routines dis- 
cussed in this section. Figure 11 shows 
an example, where user task 1 wants to 
output a byte to unit number 3. 


Summary 


It becomes apparent that even a sim- 
ple master-slave interface may become 
quite complex if designed with flexibili- 
ty in mind. Multiprocessing with shared 
memory blocks may be somewhat easier 
because the separation between the pro- 
cessors is greater. Each byte of the 
shared memory could be considered a 
separate device address, greatly simpli- 
fying the amount of buffering and 
checking that is needed in a tightly- 
coupled multiprocessor. 

For the average hobbyist, multi- 
processing has always been a curiosity, 
something that everyone dreams about 
but few people ever do. This discussion 
should be treated as a conceptual over- 


51 


the 
the 
: bit 
ot is 


52 


view, not as the final word on the sub- 
ject. | chose the approach described 
above because of my nonstandard hard- 
ware, and also because of my back- 
ground knowledge and hardware ex- 
perience. Others may choose totally dif- 
ferent and possibly better configura- 
tions. 


REFERENCES 


4. Abrams, Marshal D, and Philip G. Stein. Com- 
puter Hardware and Software. Addison-Wesley 
Publishing Company, Reading MA, 1973. 


2. Dahmke, Mark C. “‘Introduction to Multipro- 
gramming’’ September 1979 BYTE. BYTE 
Publications, Inc, Peterborough NH. 


3. Davis, William S. Operating Systems. Addison- 
Wesley Publishing Company, Reading MA, 
1977. 


4. Hasiao, David K. Systems Programming. 
Addison-Wesley Publishing Company, Reading 
MA, 1977. 


6. Martin, Donald P, Microcomputer Design. Mar- 
tin Research Ltd, Northbrook Ill, 1976. 


6. Tanenbaum, Andrew S. Structured Computer 
Organization. Prentice-Hall, Inc, Englewood Cliffs 
NJ, 1976. 


' Until | read Steve Ciarcia’s article 
“Having a ‘Private Affair’ with your 
omputer” in April 1977 BYTE, page 18, 
| had not envisaged my 6800 or my 8080 
as the basis of a time-sharing system. 
Then | asked myself, “Why not?” Why 
shouldn't a microprocessor be capable 
of supporting a time-sharing system?” | 
subsequently had the opportunity at the 
Online Conference held in London, 
ngland on May 14, 1977 to see Robert 
Uiterwyk’s 6800-based multi-user 
system. This prompted me to search 
back through the early time-sharing 
literature to check on the problems their 
designers encountered and their solu- 
ions. This article is the outcome. It does 
jot set out to specify in detail how a 
ime-sharing system can be established, 
ut it does deal with the main problems 
Mvolved. Perhaps it will provide a start- 

point for readers’ systems develop- 


tequirements 


me-sharing has been defined in 
ny different ways. For our purpose it 
“ill be taken to mean the concurrent 
ind effective utilization of computer 
ssOurces by several users, possibly at 
emote terminals. It will imply 
lultiprogramming, possibly multi- 
essing and, in general, multiple ac- 
*SS to system resources. 


Microcomputer 
Time-Sharing 


With Pointers to Further Reading 


Kenneth J Johnson 


The key requirement in any multipro- 
gramming of a time-sharing system is 
that programs and data should not be 
bound, that is, converted into a hard- 
ware-dependent form until the moment 
of execution. This requirement has many 
implications and may involve many 
problems, some of which have been 
solved in different ways with varying 
degrees of success. This article examines 
what is perhaps the main problem: 
relocating programs and data in a multi- 
programming environment. The related 
problems of scheduling and _ priority 
systems, memory addressing algorithms 
and resource allocation are also dis- 
cussed briefly. 


The Problem 


A time-sharing system should be 
designed to execute user programs to 
provide reasonable service and to 
satisfy each user’s requirements. This 
means that each user should believe 
that he has all the benefits of a 
dedicated computer. It is the basic 
philosophy of time-sharing and leads 
directly to the concept of virtual 
machines linked to physical computer 
resources through address-mapping 
tables. 

Typically, individual user programs 
are allowed exclusive use of the com- 
puter resources in some order of priority 


53 


state 


active 
wait 


for short periods. They are stopped after 
a certain time, frequently before com- 
pletion, to allow other user programs to 
be given their exclusive use of resources. 
They are continued at some future time 
from the point where they were stopped, 
in either the same memory area or a 
memory area different from the one 
they were allocated when first allowed 
to run. 

To be able to continue a program in 
this way, the system must have facilities 
to preserve the status of a program when 
it is stopped and to restore it when it is 
resumed. That is to say, at the point in 
time when one user’s program is stopped 
and another user’s program is resumed, 
the instantaneous description of the 
former program must be saved and the 
description of the latter restored. These 
instantaneous descriptions are typically 
referred to as the current state of the 
user program. The state of a program 
typically contains such information as 
the contents of the accumulators, pro- 
gram counter, and condition code 
register. The state might also contain 
pointers to the address-mapping tables 
which determine the correspondence 
between virtual and physical addresses. 

To explain this process in more detail, 
it is necessary to examine the factors 
which make multiprogramming possible 
and to study a typical system in opera- 
tion. 


Multiprogramming Requirements 


Technically, there are a number of 
considerations which decide whether it 


condition 


in a working state 
ready to run whenever brought into 


main memory. 


user wait 


VO wait 


waiting for the user to issue a com- 
mand. 
temporarily held up waiting to be ser- 


viced by I/O device. 


file wait 


temporarily delayed until another user 


program has finished using requested 
program of data file. 


dormant 


stopped running and has returned 


control to supervisory program, but 
its machine conditions have been 
preserved. 
terminated. 


Table 1: All possible states that a program may exist in at a particular 


point in its execution cycle. 


is possible to run programs together. In 
the book Computer Timesharing (see 
reference 7), Popell specifies a minimum 
of five: 


@ a supervisory program referred to as 
executive, monitor, or supervisor 

@ an interrupt processing system 

@ memory protection facilities to pre- 
vent one program from destroying 
others 

@dynamic program and data 
relocatability so that the same 
routine can be reentrant. That is, the 
routine can be used, unmodified, in 
different memory locations at dif- 
ferent times 

@ direct access facilities, or at least the 
facility for the convenient addressing 
of peripheral equipment. (For per- 
sonal computers the floppy disk is the 
typical example of a direct access 
device.) 


Typically, user programs to be run are 
stored in auxiliary memory, usually disk, 
readily accessible so that the super- 
visory program can switch them into 
main memory when their times to 
operate arrive. Each program is 
allocated the required area in main 
memory and that area is protected by 
either hardware or software, from in- 
terference by other programs. Any in- 
struction attempting to address an area 
outside the allocated memory block is 
trapped and prompts an error message. 

A system of priorities is usually im- 
plemented. The supervisory program 
permits the execution of the program 
with the highest priority until such time 
as it is suspended for some reason. 
Priorities are usually determined by a 
scheduling algorithm which is used by 
the supervisory program to keep a 
record at the status of each user pro- 
gram. Table 1 lists all the possible states 
of a program at a particular point in 
time. 

If, by bringing a program into its area 
in main memory, there is a storage con- 
flict, the program with the lower-priority 
status must be restored to its place in 
auxiliary memory. This process is 
variously called swapping, switching, 
push-pull or roll-out, roll-in. 

The most common cause of program 
suspension is a peripheral operation 
such as input/output (I/O). But there are 
others such as a machine or program er- 
ror or the lowering of priorities. Until 
suspended, however, user programs run 


for periods of time determined by the 
heduling algorithm. At the end of each 
program’s appropriate time slice (or 
when it changes status) the supervisory 
program determines which user program 
is to be run next. The state of the pro- 
gram to be suspended (contents of ac- 
cumulators, index registers, condition 
‘ode register, etc) will then be saved 
her in a supervisor's stack or dumped 
D9 auxiliary memory. 

The supervisory program then 
etrieves the next user program from 
luxiliary storage, together with that pro- 
tam’s old state. It loads this program in- 
© main memory, processes it, restores 
L, proceeds to the next user program 
ind so on, until it returns to the first user 
ogram to give it a second burst of pro- 
essing if required. Then it continues the 
ycle. It can be seen that the quintessen- 
al function of the supervisory program 
}a time-sharing system is scheduling. 


eduling 


On early machines, programs were 
sembled into the part or parts of main 
emory they were to occupy during run 
me in much the same way as they are 
microcomputers today. If a large pro- 
am required too much memory, it was 
scessary to assemble the program in 
ctions, transferring each section as it 
as completed to auxiliary storage and 
storing it (if necessary in overlays) im- 
diately prior to entry. For this pur- 
se, a suitable portion of memory was 
served for the segment of the program 
eing assembled, and for each instruc- 
in two separate addresses had to be 
corded: one giving the address of the 
irrent instruction and the other in- 
tating the address it would occupy at 
n time. With elaborations, this tech- 
ie became the basis of early time- 
ring systems. 
Basic to the running of these early 
istems was the concept of independent 
tipheral operation. The processor, 
iving initiated an !/O routine for one 
Ogram, could then proceed to service 
le computational needs of other pro- 
ams until the I/O routine signaled its 
pletion by interrupting the pro- 
or operation. For various reasons, 
e time-sharing arrangements did not 
utilize even the relatively slow 
age-access time on some computers. 
® multiprogramming concept was 
eloped fully to realize this potential. 
ne logic was incontrovertible: if the 
ichine had spare memory and spare 
tipherals, these could have been 


utilized by a second program. If this still 
left unused capacity, why not load a 
third program to use the peripherals and 
access time not required by the first and 
second programs, and so on. 

Tsujigado showed that it was 
theoretically possible to process 
simultaneously a large number of pro- 
grams (eg: 256) in the conversational 
mode. (See reference 8.) Although 
theoretically possible, this would be im- 
practical even now on large computers 
because of the large memory require- 
ments. In consequence, it is necessary to 
resort to swapping techniques, and a 
suitable scheduling algorithm. 

The swapping techniques adopted in- 
itially depended upon the hardware 
design; the control mechanisms varied 
widely between manufacturers and be- 
tween models. Some hardware is still re- 
quired for effective control of the pro- 
cess, but the software usually provides 
the necessary control procedures. In 
“Computer Software” Archibald et al 
specify the necessary software features. 
(See reference 1.) They include: 


@a means of reserving memory and 
peripherals for exclusive use by in- 
dividual programs for predetermined 
periods of time 

@ a means of switching from one pro- 
gram to another to optimize com- 
puter performance 

@ facilities to relocate programs 
dynamically during execution as the 
overall pattern of programs in the 
computer changes 


The effect of these routines is to provide 
multiprogramming facilities which 
enable many users to initiate programs 
and to schedule them through the 
system according to their relative 
predetermined priorities. 

The simplest system is based on a cir- 
cular queue for round-robin scheduling. 
Each program accepted into the system 
is assigned a fixed time slice and pro- 
cessor operation is switched from one 
program to another in round-robin 
fashion until each program is com- 
pleted. In this arrangement, only one ac- 
tive user program is in main memory at 
one time. Other active programs are 
held on disk. 

In other systems several user pro- 
grams may simultaneously reside in 
main memory. The operational switch- 
ing between them is controlled by a 
clock which is used to generate an inter- 
rupt to signal the processor that a cer- 
tain time period has elapsed. The 


55 


56 


scheduling algorithm is then entered 
every time a clock interrupt occurs. If it 
is found that the program in main 
memory has exhausted its time slice or 
has changed its status, that program is 
swapped for the next program in the 
queue. 

Most sophisticated installations of 
any size find the need to operate a 
system of queues. The appropriate 
queue to be serviced by the processor at 
any particular time will be selected ac- 
cording to priority and program type by 
the scheduling algorithm. Programs are 
initiated, or released for processing by 
being selected from the tops of the 
various queues which are formed in ac- 
cordance with the particular installa- 
tion’s design philosophy. In addition to 
systems of queues, the supervisory pro- 
gram normally has to deal with systems 
of priorities. Again, what determines 
these priorities will be a matter of design 
philosophy. Various criteria are used in 
practice. Usually it is possible for the 
system itself to cause priorities to be 
modified while programs are being 
queued. Such modifications are 
especially desirable in real-time systems 
because one program might be con- 
tinually bypassed or because a deadline 
is approaching and the program con- 
cerned is not being serviced. 

From time to time it may be that a 
program being queued will have to take 
precedence over a program being ser- 
viced. Downgrading of priorities hap- 
pens often in scheduling systems. To 
facilitate this, some operating systems 
provide a roll-in roll-out facility which 
enables the supervisory program to 
make a request for processing time on 
behalf of a higher-priority program in 
the queue. This will result in a lower- 
priority program being rolled out to 
enable the new program to be pro- 
cessed. Programs rolled out in this way 
are written into temporary storage along 
with this current status. When changing 
circumstances permit the reloading of 
programs temporarily suspended, the 
supervisory program will automatically 
roll in these programs and they will 
restart from where they left off. 

It may be that the exact locations in 
memory which such programs and their 
data were using are no longer available. 
To deal with this situation, operating 
systems provide the facility to relocate 
programs dynamically. 


Scheduling Methods 


To summarize the discussion so far, 


there are basically two methods of 
scheduling: 


@ simple swapping systems with only 
one program at a time residing in 
main memory for a fixed unit of time 
in accordance with a system of 
priorities 

@ elaborate systems which overcome 
the disadvantage of only one user 
program in main memory at a time 
with consequent waste of time due to 
switching 


This necessity of switching programs in- 
to and out of main memory at speeds 
approaching the internal clock rate 
leads to further problems which can on- 
ly be solved with additional hardware 
and software facilities. In particular, 
since a given user program does not 
always get loaded into the same place in 
memory, it leads to addressing prob- 
lems. 


Addressing Techniques 


In most systems, individual program- 
mers will have to write their programs 
without knowing which other programs, 
if any, will share main memory with 
theirs. The implication must be that they 
will need to use symbolic addresses that 
will be converted to absolute addresses 
at some time by the supervisory pro- 
gram when allocating memory space 
and peripherals to the various programs. 
This necessity has led to the present 
time-sharing philosophy which requires 
the conceptual separation of absolute 
storage addresses from the logical 
system addresses. 

In a multiprogramming system, 
resources are not normally allocated to 
programs until execution time. Since the 
physical resources allocated may be dif- 
ferent during each time slice, it is essen- 
tial that the run time representation of 
programs should be in hardware-in- 
dependent form. This means that the ad- 
dresses in particular should be virtual 
addresses. Physical addresses will be 
represented by an address-mapping 
table which will be updated whenever 
programs are moved from main memory 
to temporary storage and vice versa. 

As Wegner points out, the structure of 
the address-mapping table will depend 
not only on the relation between the vir- 
tual address space and the physical ad- 
dress space, but also upon the hardware 
facilities available for performing ad- 
dress mapping. For example, in “Ad- 
dressing Structures” Gammage recalls 


that the need for dynamic program 
relocation was met on second genera- 
tion machines by the provision of a 
_ single base register, the contents of 
__ which were added to a virtual address 
| generated within the program to map it 
__ into anactual main storage address. (See 
reference 6.) 

The major drawback here was that the 
program had to be moved between main 
_ storage and temporary storage as a 
_ single unit — a wasteful process where 
large programs are involved. It also 
_ meant that no program could be larger 
' than the available main memory space. 
' To overcome these problems, more 
_ elaborate addressing structures were 
devised. These structures reflected the 
hierarchical organization of problem- 
‘oriented programs and the need in real- 
time systems to provide for the 
organization of sets of independent, 
' multiprogrammed jobs. To give the 
| facility of dynamic program relocation, 
_ for example, some machines were fitted 
with special hardware. IBM built upon 
the addressing system of the IBM 360, 
which allowed only two levels of ad- 
ressing, and provided a third level. 
‘They did this by providing two sets of 
additional base registers, one set to act 
_ in the same way as the base registers of 

the IBM 360, being accessible to the pro- 
“grammer. The other set, sometimes 
_ known as segment registers, accessible 
only to the supervisory program, are 
used in allocating storage. 

Gammage outlines three such 
‘schemes, but suggests that because 
‘these schemes use variable length 
segments as the basic unit for storage 
“swapping, they are very inefficient in 
terms of storage utilization. Their ineffi- 
“ciencies cannot be overcome complete- 
‘ly unless a full paging system is 
“employed, using fixed length units for 
' Swapping. 


_ Paging 

__ Most modern machines provide some 
kind of virtual memory structure if they 
are to be used for multiprogramming. 
This addressing space may be provided 
by hardware or created interpretively by 
“software. Most modern systems also in- 
“terpose an address-mapping structure 
“between virtual and physical addresses. 
_ Typically, the virtual address of a 
“word in memory consists of two parts. 
The first refers to a page number, a fixed 
ize block of main memory. The second 
‘tefers to a location within the block. In 
Operation, secondary memory is con- 


nected to these blocks through high- 
speed I/O devices that permit programs 
to be swapped directly from disk into 
any one of the main memory blocks 
without interfering with processor 
operation. This process is known as 
direct memory access and allows execu- 
tion of one user program in one block of 
memory while programs are being 
swapped to and from another block. 

Main memory is similarly divided into 
physical pages, each capable of han- 
dling one page of a program or block of 
data. Program pages, although the same 
size as main memory pages, will not 
necessarily be contiguous in main 
memory and may well occupy different 
main memory pages at different times. 
One of the functions of the supervisory 
program in a paging environment is to 
form and keep up to date a page table 
which establishes a mapping of the pro- 
gram and data pages into physical 
pages. By this means, the address of a 
page within a program is transformed 
via the page table into an absolute 
memory location. 

In practice, to achieve dynamic 
relocation, it is necessary to extend the 
instruction address to include a segment 
number as well as a page and location 
number and to leave the binding of ad- 
dress parameters until run time. The seg- 
ment number is then used to access a 
segment table belonging to the user 
whose program is running at that instant. 
The reference in the segment table is to 
the page takle which in turn maps onto 
the physical page and through this to the 
physical address. 

This scheme can be very clumsy and 
take too long, unless the machine is fit- 
ted with additional registers which per- 
mit the development of an associative 
memory. The associative memory com- 
bines the segment and page numbers, so 
that only one interrogation is required to 
find the number of the physical page 
containing the appropriate address. 
Systems in which page registers are 
designed to be associatively accessed 
operate various page turning algorithms 
which determine: 


@ whether certain pages are in memory 

@ whether pages are to be preserved or 
overlaid 

@ how recently pages have been used 
so that, if need be, they can be dis- 
posed of when new pages are brought 
into memory 


57 


a eee atte 


These systems are the basis of the virtual 
memory concept which in turn provides 
the means for dynamic relocation. 


Dynamic Relocation 


There is a clear need for dynamic 
relocation in a time-sharing system. In 
general, a program consists of instruc- 
tions and data. While being executed it 
will contain references to intermediate 
results. These will need to be mapped or 
translated into references to specific 
parts of the machine (eg: machine ad- 
dresses, device numbers, etc). This can 
be accomplished at three different 
times: 


@ During compilation, assembly, or 
translation into machine code. The 
result is an absolute program which 
will be assigned to the same memory 
locations and use the same 
peripherals each time it is run, assum- 
ing they are available. (This is the 
most common scheme for user pro- 
grams in typical personal computers.) 

@ When the program is loaded. Most 
machines have a relocating loader 
which enables programs to be 
relocated statically. 

@ During execution, using dynamic 
relocation. 


In multiprogramming it is difficult, if 
not impossible, to allocate memory con- 
currently to two or more independently 
written programs if they are absolute 
programs. The allocation method re- 
quires that the particular combination 
of programs to be run at any one time 
and their storage requirements are 
known in advance. This is information 
that is not always available when the 
programs are written. 

If the absolute addresses are left un- 
translated by the assembler or compiler 
and translated by a relocating loader in- 
to actual addresses only when the pro- 
gram is loaded for execution, the par- 
ticular combination of programs to be 
loaded together can be decided just 
prior to loading. This method is known 
as static relocation. Using static reloca- 
tion it is possible, with a relocating 
loader, to allocate memory to a program 
each time it is executed, provided: 


@ The program can be separated into a 
data part and a procedure part. 

@ The procedure part is never modified 
during execution. 


@ The data part, including the contents 
of registers at the time of interrupt, 
contains no absolute memory ad- 
dresses. 

@ When the program is interrupted, the 
data part is dumped onto auxiliary 
storage. 


These four conditions are not difficult to 
achieve. Nevertheless, the relocation of 
an interrupted program by this method 
has a number of significant drawbacks, 
which are summarized by Denning in his 
article “Virtual Memory.” 

In dynamic relocation, the translation 
of virtual addresses to main memory ad- 
dresses is delayed until the last possible 
moment (ie: until access to memory is 
required in running the program). 
Because the program contains no ab- 
solute addresses, it is independent of the 
actual memory allocation it receives. 
This means that it can be interrupted at 
any time and subsequently reloaded in- 
to a different part of memory without 
modification. This desirable facility can 
only be achieved at the expense of addi- 
tional hardware and more complex in- 
struction formats. This is desirable since 
instructions in general must now hold 
untranslated addresses in a form ap- 
propriate to the relocation technique 
adopted. 

There is also the related problem of 
storage protection, that is, the need to 
prevent user programs from interfering 
with each other while being processed. 
The usual solution to this problem is to 
allow them to operate in well-defined 
areas of memory only, that is, 
unrestricted access to all parts of 
memory being reserved for the super- 
visory program only. Frequently the 
technique used to achieve dynamic 
relocation can also be used to effect 
storage protection. 


Conclusion 


Many programs running concurrently 
in a multiprogramming environment 
typically require far larger total memory 
space than is available in a particular 
system. The virtual memory concept 
and dynamic relocation techniques 
outlined here have solved many of the 
problems of managing and optimizing 
the use of large, hierarchical memories. 
These techniques are often seen in large 
computer systems and in principle can 
be adapted for use in microcomputer 
time-sharing systems.@ 


REFERENCES 


Archibald, HIA, et al. ‘‘Computer Software.’ 
Journal of the Institute of Administrative 
Management. England, 1966. 

Coffman, EG Jr, and L. Kleinrock. ‘‘Computer 
Scheduling Methods and their Counter- 
measures.'’ SJCC, volume 32, AFIPS, 1968. 
Denning, PJ. “Virtual Memory.’’ Computing 
Surveys, volume 2, number 3, Sept. 1970. 
Dennis, JB. ‘‘Segmentation and the Design of 
Multi-programmed Computer Systems."’ /EEE 
International Convention Record, part 3, 
1965. 

Dennis, JB, and El. Glaser. ‘‘The Structure of 
On-line Information Processing Systems.”’ In- 
formation System Sciences: Proc of 2nd Con- 
gress, edited by DW Walker, 1965. 
Gammage, ND. ‘‘Addressing Structures.” 
Journal of British Computer Manufacturers, 
1966. 

Popell, SD, editor. Computer Timesharing. 
Prentice-Hall, Englewood Cliffs NJ, 1966. 
Tsujigado, M. ‘‘Multi-programming, Swapping 
and Program Residence Priority in the FACOM 
230-60.’’ SJCC, volume 32, AFIPS, 1968. 
Wegner, P. ‘‘Machine Organization for Multi- 
programming."’ Proceedings of the ACM Na- 
tional Meeting, 1967. 


59 


Time-Sharing: 
Squeezing the Most 
From Your Micro 


Sheldon Linker 


Although one normally thinks of time- 
_ sharing as only working on large com- 
_ puter systems, it is possible to run even 
on small systems. Many of the newer 
large-scale time-sharing systems use vir- 
_ tual memory and swapping, which is not 
_ possible or practical on smaller 
machines. Virtual memory requires map- 
ping hardware (ie: a machine with inter- 
ruptable instructions, such as an IBM 
370). Swapping requires a reasonably fast 
disk, which could cost as much as $2000. 
_ What we are left with is an in-core 
_ system that keeps everything running in 
_ real memory at all times. 
The first consideration is the assembler 
and loader. In your current system, a pro- 
gram’s location can be assigned only at 
assembly time. On a time-sharing system, 
the programmer may not know where 
the program will be located in memory. 
_ The reason knowledge of this location is 
conditional is that a decision point in the 
design of the system has been reached. If 
the system is to be nonrelocatable, the 
_ programmer may define the location of 
the program. The problem that arises 
_ here is that if, at the time the program is 
_ to run, the place in memory that the pro- 
_ gram was supposed to run in is already 
_ occupied, it cannot be loaded. On the 
other hand, if the system is capable of 
_ felocating, the program can be put 
_ anywhere in memory. This produces the 


additional benefit that subroutines do not 
have to be assembled with the program. 
To perform this relocation the assembler 
leaves offset information in the object 
tape or file which the loader will interpret 
as it goes. One possible relocation code 
scheme is shown in table 1. Of course, all 
sorts of schemes are possible. Note that 
relocation alone will take some amount 
of coding and execution time. 

The second consideration is the alloca- 
tion of system resources. In most cases 
this should concern only input/output 
(I/O) devices, although there may be 
some systems with interrupts not 
associated with I/O devices. There are 
basically three types of I/O devices. The 
first and most common type of device is 
the single owner. This is a device which 
can only be used by one task at a time: a 
task is a program running in the time- 
sharing system. An example of a device 
which probably should be single owner is 
a cassette recorder. It would just not be 
particularly helpful to have someone 
else’s data in the middle of your program. 

The second type of I/O device is the 
shareable unit. The most common ex- 
ample of this is the floppy disk. For a disk 
to be correctly shared, the operating- 
system routine which is handling the disk 
must reposition the heads every time the 
disk is used. Most systems already use 
this method, but there are those that 


61 


Command to Run Time Loader Explanation 


a a aIN RT 


Start absolute loading: The header code is followed by the absolute 
start address. In this case, the loader behaves 
as any other loader. There is no relocation of the 
data and instructions that follow. Loading starts 
at the address given. 2 


Start relative loading: The header code is followed by an address. 
Loading begins at the first available address, as 
determined by the operating system. From this 
point on, a relocation factor will be added to all 
instructions and data flagged for relocation. 


Skip bytes: This code is followed by a number designating 
the number of bytes to be skipped. This is 
useful in defining uninitialized buffers and is 

more efficient than repeated uses of code to 

reserve 1 or 2 bytes (see below). 


Define absolute start address: The header code is followed by the absolute 
start address. If the routine is a subroutine, this 
code would not be used, as the module has no 
start address. When this code is used the pro- 
gram will be started at the specified address 
once loading is completed. 


Define relative start address: Similar to the preceding code; however, pro- 
gram execution will start in a position relative to 
the first location. 


One byte: The header code is followed by one byte. This 
code gets no relocation, because it is either an 
instruction without an address, or data which is 
too small to be an address. 


Two bytes absolute: The header code is followed by the 2 bytes. 
This code also receives no relocation because it 
is either an absolute address value, a 1-byte im- 
mediate instruction with its data byte, or it is a 
relative address instruction which is self- 
relocating. 


Three bytes relative: The header code is followed by the 3-byte in- 
struction. This code will receive a relocation 
factor. 


Three bytes absolute: The header code is followed by a 3-byte instruc- 
tion with an absolute address value which is un- 
changed in loading. 


Two bytes relocatable address The header code is followed by the address 
values: data. The address data is always relocatable. 


End: At this point, contro! returns to the program 
that called the loader if no starting address was 
given in the loading module. If the loading 
module contained a start address that address 
is called. 


Table 1: An example of a quick relocation scheme designed with a 6800 processor in mind. This set of instructions 
would be stored along with the program on the auxiliary memory to direct the loader as to how to reinsert the data into 


main memory each time the program was run. The point of this scheme is to provide a minimal amount of computation 


when a program is loaded from a library into memory prior to execution. Similar schemes can be chosen for any par- 


ticular computer's architecture. 


have a call to position the head and 
| another set of calls to read, write and 
verify. Separate calls cannot be used 
because a second task might reposition 
the heads before the first task had a 
' chance to read or write. 
| The third type of I/O device is the 
device that is the system’s alone. An ex- 
"ample of this is the clock interrupt, a 
solitary interrupt device. It must be the 
system's job to keep track of time. It is 
also the charge of the system to keep 
track of which devices are owned by 
which tasks. The system must place all of 
he task’s allocated devices back on the 
available list if a cancel-the-program func- 
ion is executed. 
When a task wants to perform input or 
Output, it might use a considerable 
‘amount of system time monitoring status 
ines, thereby making time-sharing im- 
possible, unless all, or at least some of the 
devices are interrupt driven. The best 
ay to handle things is to have a routine 
hich will cause a task to wait until an in- 
tupt is received for that task, then let 
the task handle the interrupt, including 
polling. So far, the routines required are 
summarized in table 2. This is not to say 
that these are the only routines you will 
er need; table 2 is probably the 
oat set of functions you will ever 
When handling disk interrupts, it is 
ecessary to keep track of which task, if 


Hevice, it must get a return code stating 
whether or not the device is busy. Other- 
vise, the system must queue its request 
ie: make the program wait and handle 
he request whenever it can). 

A third consideration is scheduling. 
Each task has a status: ready to run, run- 
ling, running with an interrupt pending, 
r waiting. At some point, the system 
Must stop running one task and begin 
‘unning another. 

“We will require the operating system to 
hedule the tasks every time a task 
isks to wait. Since that task cannot pro- 
eed, we will perform a task that is not in 
| wait state. There are three other times 
hen we may optionally reschedule the 
sks: every interrupt, every clock inter- 
fupt, or every interrupt and system call. 
ese methods are called demand 
heduling, event scheduling, time slic- 
g, and quick scheduling, respectively. 
he fastest method is to wait for WAIT 
lis. The other three methods are fairer, 
lepending on how you look at things. 
The actual method of scheduling leads 
D another decision point. The scheduler 


@ Attempt to allocate a particular device. This 
routine must give a return code stating 
whether or not the device is already being 
allocated. 

Free a device. 

Read a character from a particular device. 
Write a character to a particular device. 
Read a particular disk block. 

Write a particular disk block. 

Wait. 

End a task. 


Table 2: Minimum routines that are required for handling a time-sharing 
system. The end task routine should return control to the supervisory 
program with information that the task is totally finished. The last thing 
you want to do is encounter a halt instruction in the program code and 
halt the machine. 


may be foreground-background, round- 
robin, or priority scheduling. 
Foreground-background is the fastest. In 
this type of scheduling, the system scans 
down the list of tasks and runs the first 
nonwaiting task. When this method is 
used, the position on the list is the im- 
portant factor. 

Round-robin scheduling starts the 
search for an executable task after the last 
task running. The search starts at the top 
of the list when it hits the bottom. This 
way gives every task its chance to run. 

Priority scheduling requires a list of 
priorities. This scheduler runs the task 
with the highest priority which is not 
waiting. This is the fairest method 
because each task is given exactly what it 
deserves. When you run off the bottom 
of the list, using either the foreground- 
background or priority scheduling 
method, you have the option of starting 
over or executing a WAIT instruction. 
Although it will cost a byte of program 
memory, it will save considerable time 
on a 6800 or similar machine, since the 
interrupt vectoring will be half done by 
the time you get the interrupt. 


63 


PROCESSOR TERMINAL 
PROCESSOR TERMINAL PRINTER 


* Figure 1: A system set up with each processor having its own mass storage device and I/O peripherals. 


Figure 2: This arrangement uses resource sharing. To make this arrangement work, processor to processor data links must 
be added. Time-sharing and multiprogramming can be useful in the personal system. What happens when two children 
and two adults must share several terminals? What about the case when you want to do a listing or assembly on a slow 
printer while continuing an editing operation on a separate source file? The smallness of the scope of a computer does not 
rule out the use of resource sharing and multiprocessing. 


PROCESSOR 


PROCESSOR 


64 


This covers most of what you need, but 
there are a few more minor considera- 
tions to follow: 


@ A task has to get into the machine 
somehow. Two possible methods 
come to mind. One is the typical time- 
sharing method with each terminal 
getting its own task. The other is to add 
a system call which adds a new task. 

® You can set things up so that each task 
has a fixed amount of memory, which 
may or may not be reset between 
tasks, or use some sort of a system 
where the tasks can acquire and free 
memory dynamically. 

@ Programs must be nice to one another, 
as very few of the machines around 
have any sort of memory protection or 
privileged instructions. 

@ When an interrupt occurs, or a task is 
otherwise stopped, the registers, in- 
cluding the program status word 
(PSW), and stack pointer must be 
saved and later restored. Depending 
on the type of programs you run and 
your type of machine you may have 
to save and restore all or part of page 
0. If you have a 6502, you will also 

___ have to deal with the stack’s page. 
@ Programs which can be run concur- 

rently by more than one task are reen- 

trant. You may wish to set up some 
way of effectively using reentrant pro- 
grams, such as having a null task, 


which have reentrant subroutines; or 
by having various small reentrant 
routines always in the same place in 
memory, such as multiply and divide. 


There are other methods of going 
about this completely. Many BASIC 
systems will have one BASIC interpreter 
in memory along with multiple programs, 
and will execute one line of BASIC code 
and then go on to the next pseudo-task. 
This will also work for APL, although long 
matrix operations will tend to extend the 
intervals between transitions from one 
process to another. It is a debatable point 
whether or not a time-sharing APL and 
two workspaces will ever fit into the same 
memory at one time. 

Multiple-processor time-sharing 
systems are also possible. Assuming that 
you have a central processor with disks 
and printers, there is a method that can 
save a lot of money. This method is 
resource sharing. Figure 1 shows a typical 
group of three computers each working 
independently. Each processor handles 
everything with inefficient use of the 
printers and disks. Figure 2 depicts a 
resource sharing setup. This requires the 
addition of processor to processor data 
links. In this setup, each peripheral pro- 
cessor does the computing while the cen- 
tral processor handles queued I/O and in- 
terrupts much like the simple time- 
sharing systems above.m@ 


65 


em hae mec es Po 
Paha f mee Mas Cate AL 


Designing 


G A Van den Bout 


Nearly every system, whether it is 
composed of ten lines of code or ten 
thousand lines of code, will perform 
three distinct functions. It will receive 
input from the user, it will process this 
input and it will output the results. Of 
_ these three functions, the one which un- 
doubtedly receives the least attention 
from the system designer is the com- 
munication from the user of the system 
to the system itself. 

_ Hours and hours may be spent 
“perfecting a processing algorithm and 
‘computing field lengths so that the 
‘resulting output can be instantly 
“understood, yet due to the lack of con- 
sideration put into the input stage of the 
‘System, the user may be forced to plow 
through a series of questions and 
answers directed to him by the system. 
‘This is a situation which would try the 
Patience of even the most tolerant per- 
Son. Sometimes a situation even worse 
than this series of questions may be 

aused by the designer who is very 
familiar with the system. In an effort to 
ave time and memory space, the 
designer may decide to reduce or even 
entirely omit any prompting by the pro- 
gram. This leaves the decision of what 
information must be entered to the intui- 
ion of the user, or to a system manual 
which will probably not be around when 
it is needed. 

A good solution to the problem would 
‘be a well-designed command language 
Which would allow the user to supply all 
the information needed by the program 
at one time, in a single command. Then, 
if any of the required data has not been 
€ntered, the computer can prompt the 
ser for the remaining items. This 


a Command Language 


method allows for both the experienced 
user who knows exactly what data the 
program needs at every instant and for 
the first-time user who requires some 
help from the system now and then, but 
who will soon become familiar with the 
system and probably prefer to avoid the 
repetitious prompting. 

Consider the following example 
which, although hypothetical and not 
necessarily typical of chess playing pro- 
grams in general, illustrates problems 
which do exist in many systems. A 
superb chess playing program has been 
designed after months of hard work. 
Along with this program, a graphics out- 
put system has been devised to display 
the present formation of the board after 
each move is made. When the user sits 
down to test his skill against that of the 
machine, he becomes a partner to the 
following dialogue: 


(C: COMPUTER; P: PLAYER) 

Cc: DO YOU WISH TO MOVE(1), CAPTURE(2). OR 
CASTLE(3)?ENTER 1, 2, OR 3. 

P14 

C: ENTER NUMBER (1-8) OF ROW THAT PIECE IS 
ON. 

ti J 

C: ENTER LETTER (A-Z) OF COLUMN THAT 
PIECE IS ON. 

P: D 

C: ENTER NUMBER (1-8) OF ROW TO WHICH 
YOU ARE MOVING. 

Pe 


No matter how well the machine 
plays chess, it is doubtful whether it will 
be used by any particular person for 
more than a few games. Despite the 
thought that went into the rest of the 


67 


Figure 1: A_ finite-state 
machine with one initial state 
and three final states that is 
capable of recognizing the 
words: sat, sog, sogs, hat, hog 
and hogs. 


program, no creative thought was put in- 
to the command language for the 
system. 

Now, consider the following conversa- 
tion between the computer and the 
player. 


C: ENTER YOUR FIRST MOVE. 

P: MOVE FROM D2 TO D4 

C: | MOVE FROM HS TO E2. CHECK. 
P: CAPTURE E2 

C: FROM WHERE? 

P: H2 

Css 


This method not only reduces the un- 
necessary chatter that was encountered 
in the first case, but gives the player 
credit for possessing some knowledge of 
what is happening in the game. By tak- 
ing time to design an easy-to-use com- 
mand language, the designer can pro- 
duce a game that will not only play well 
but will also be enjoyable to use. 

The problem encountered when 
designing a program which handles a set 
of commands such as these is that often 
no organized approach is taken to 
assure that the allowable commands are 
processed correctly. Each input string 
may be scanned and rescanned for the 
information needed by the program. 
This type of haphazard approach will 
very likely produce unreadable code 
that is hard to debug and may contain 
hidden errors and ambiguities. To avoid 
these problems, the theory of finite-state 
machines (FSMs) may be used to pro- 
duce a recognizer program which can 
parse the input commands and produce 
a structured command which can be in- 
terpreted by the system. 


Finite-State Machines 


Since the aim of this article is to show 


how to use finite-state machines to aid 
in programming a command language, 
not to thoroughly cover FSM theory, | 
will give a rather informal description of 
the machines. The representation used 
here has appeared in various places, and 
was chosen mainly because of its 
simplicity for this application. 

Consider the finite-state machine 
shown in figure 1. Each circle represents 
a state of the finite state machine. In this 
example there are seven states: S, 1, 2, 3, 
F1, F2 and F3. The names chosen for the 
states are arbitrary. The directed lines 
between the states are called state tran- 
sition paths. The state transition path, 
labeled with an H, located between 
state S and state 1, is named S-1(H). The 
parenthetical symbol will be omitted 
when there is no ambiguity, such as the 
path 1-3. The states which are circled 
twice are final states. The final states in 
figure 1 are F1, F2 and F3. The states 
which are pointed to by arrows which 
lead from no other state are called in- 
itial states. The only initial state in figure 
1 is S. 

This finite-state machine can be used 
to recognize several different strings, a 
string in this case being merely a se- 
quence of letters. For a particular string 
to be recognized, an ordered path must 
exist between an intial state and a final 
state, such that every symbol in the 
recognized string exists in its original 
order along the path starting at the in- 
itial state. Using this finite-state 
machine the string HOG is recognized in 
the following manner. Starting at initial 
state S, the first symbol in the string, H, 
leads to state 1 along path S-1(H). The 
second symbol, the letter O, selects path 
1-3 leading to state 3. Finally, the symbol 
G leads to the final state F2 via the path 
3-2. Since this path exists from the in- 
itial state S$ to the final state F2, the 


string has been recognized. The other 
strings which can be recognized by this 
FSM are SAT, HAT, SOG, SOGS and 
HOGS. 


State transition paths need not pro- ; 


ceed to a new state. A state transition 
path may return to a previous state or 
may even return to the state from which 
it started. Figure 2 is an example of a 
finite-state machine which will 
recognize any string which begins and 
ends with an A and which has zero or 
more Bs between the two As, such as the 
strings: AA, ABA, ABBA, etc. 


Sample Problem 


Now that the basics of finite-state 
machines have been explained, a simple 
command language will be defined and 
implemented using them as a design 
tool. Using this example, a similar pro- 
cedure can be followed to produce a 
recognizing program for nearly any 
command language that might be 
chosen. 

Assume that there is a game played 
on a chess board. The columns of the 
board are labeled with the letters A thru 
H and the rows of the board are labeled 
with the numbers 1 thru 8. The three 
possible moves which may be made by 
any player consist of moving a piece 
from one square to another, MOVE, 
moving a piece to another square and 
capturing the piece on that square, CAP, 
or removing one of his own pieces from 
the board, TAKE. Some examples of 
commands which are to be accepted by 
the program are: 


MOVE FROM Ai TO C3 
CAP FROM 4H TO H1 
TAKE FROM E5 

MOVE TO Fé FROM 6G 


It can be seen that the commands are 
made up of six basic entities that must 
be recognizable. Three of these entities 
are the commands, MOVE, CAP and 
TAKE. TO and FROM are keywords 
which must be identified in order to in- 
terpret a command. The final type is a 
position which may consist of a letter 
followed by a number or a number 
followed by a letter and will exist one or 
more times in each command. 


~ Command Recognizers 
When a command is entered to be in- 


terpreted by the computer, it consists 
merely of a sequence of symbols (let- 


Figure 2: Finite-state machine that has a state transition 
path loop. 


ters, numbers and spaces) having no syn- 
tactic meaning of their own. The mean- 
ing only starts to become clear when the 
symbols are grouped together to form 
tokens. The tokens existing in this game 
are the six entities described above. 
These tokens will be referred to as 
<MOVE>,<CAP>,<TAKE>,<TO>, 
<FROM>,<POS>. A finite-state 
machine which will recognize each of 
these tokens is shown in figure 3. Blanks 
are shown on this diagram and in the 
following diagrams as small squares. 
Note that one new token has been add- 
ed to the six types listed above. This new 
token is <END> recognized when an 
end of line (EOL) delimiter is found. _ 

Most of this finite-state machine is 
self-explanatory. Note, however, the two 
states L15 and L23 which are entered 
after matching an initial C or F, respec- 
tively. These states represent a point in 
the matching process where the token 
being recognized may be either a com- 
mand (<CAP>or<FROM>) or a posi- 
tion(<POS >). When the next symbol in 
the input stream is examined, the 
recognition of the token as a position 
(paths L15-L20 and L23-L20) or as a com- 
mand (paths L15-L16 and L23-L24) can be 
made. 

The finite-state machine described 
performs the process known as lexical 
analysis, the process of grouping 
together input symbols to determine the 
input tokens. The next process to be per- 
formed is the process of syntactic 
analysis, checking the order of the 
tokens to see if they form a valid com- 
mand. For example, the two “com- 
mands”: 


MOVE FROM A1 TO C3 
A1 C3 FROM TO MOVE 


are both composed of valid tokens for 
the example language, but only the first 
command is syntactically correct. To 
determine the syntactic correctness of a 
command another finite-state machine 
must be designed. This machine, rather 
than having paths labeled with symbols 


69 


» the 

the 
t bit 
ptis 


1,2,3,4,5,6,7,8 


from a character set, will have labels 
that are valid tokens of the language be- 
ing processed. Figure 4 shows a finite- 
state machine which will accept the 
valid commands of the language. 


Semantic Routines 


At this point two finite-state machines 
have been produced to be used to 
recognize valid commands for the 
game. Before these machines are used 
to help produce code to process actual 
commands, the results of processing 
each command must be defined. After a 
decision has been made regarding these 
results, semantic routines, routines to 
carry out the processing of the various 


1,2,3,4,5,6,7.8 


1,2,3,4,5,6,7.8 


ore 


commands, should be associated with 
each state transition path of the finite- 
state machines. In our system, each 
command will be converted to a set of 
codes and placed in an array called 
COMMAND which will have five 
elements. COMMAND(1) will be set to a 
code describing the command operation 
(1=MOVE, 2=CAP, 3=TAKE), COM- 
MAND(2) and COMMAND({3) will hold, 
respectively, the column and the row 
position associated with the FROM 
keyword. COMMAND(4) and COM- 
MAND(5) will hold the column and row 
position associated with the TO 
keyword. Figure 5 shows the expected 
results of processing following two com- 
mands: 


<TAKE> 
<MovE> 


<caP> 


<Ppos> 


(8) 2. <FROM> 


Figure 3: A lexical finite-state machine for recognizing the entities that will be accepted by the <TO>,<TAKE>, 
<MOVE>,<CAP>,<FROM>,<END>,<POS>. 


70 


MOVE TO C1 FROM H6 
TAKE FROM A7 


For the FSM that is shown in figure 4, 
table 1 shows the semantics which will 
produce the desired results. Routines for 
paths such as $1-S2(}<MOVE >) set the 
first element of the COMMAND array to 
‘indicate which command was recog- 
nized. Path $2-S3 is an implicit recogni- 
tion of the word FROM and has no se- 
mantics associated with it since nothing 
must be done until the path $3-S4 is tra- 
versed. When this action occurs, the row 
and column are stored in the COM- 
MAND array to indicate the FROM posi- 
tion. When a final state is reached, an 
entire command has been parsed and 
the COMMAND array contains all of the 
necessary information to fully describe 
the command. 

The lexical finite-state machine shown 
in figure 3 will be used by the syntactic 
finite-state machine just described to 
obtain tokens from the input stream 
when they are needed. The output from 
the lexical finite state machine will be a 
two-element array named TOKEN which 
will contain the following codes. If the 
token is <POS>, then the first element 
of TOKEN will be the row number and 
the second element will be the column 
letter. If the token is not <POS>, then 
the first element of TOKEN array will be 
set to 0 and the second element will be a 
code indicating which type of token was 
fecognized (1 for <MOVE>, 2 for 
~<CAP>, 3 for <TAKE>, 4 for <TO>, 
5 for <FROM>, 6 for <END>). The 


Figure 4: A syntactic finite-state machine for accepting valid commands. 


semantic routines associated with the 
lexical finite state machine to set 
TOKEN correctly are shown in table 2. 


Implementation 


The first step in implementing the 
command language is the conversion of 
the lexical FSM into a subroutine which 
locates the next token in the input 
stream and places the necessary codes 
into TOKEN as described above. If at 
any time, an error is detected while 
attempting to recognize a new token 
from the input stream, then TOKEN(1) is 
set to 0, TOKEN(2) is set to 7 and this 
routine returns to its calling routine. 

A program named LEX, written in a 
BASIC-like language, which ac- 
complishes these results is shown in 
listing 1. Prior to the invocation of this 
routine, the input command must be ob- 


Figure 5: Two example COMMAND ar- 
rays. COMMAND array A results after 
processing the command MOVE TO C1 
FROM H6. COMMAND array B is the 
result of processing TAKE FROM A7. 


the 

the 
t bit 
pt is 


a TEE It 


72 


L1-L2 
L4-L5 
L8-L9 
L13-L14 : 
$1-S2(}<MOVE>): SETCOMMAND (1) TO1 
$1-S2(<CAP>) : SETCOMMAND(1) TO2 L17-L18 : 
$1-S3 SET COMMAND(1) TO3 
$4-S7 SET COMMAND(2) TO COLUMN (A-H) L26-L27 : 
SET COMMAND(3) TO ROW (1-8) 
$10-S13 SET COMMAND(4) TO COLUMN (A-H) L1-L19 
SET COMMAND(5) TO ROW (1-8) L1-L22 : 
S8-S9 SET COMMAND(4) TO COLUMN (A-H) L19-L20 : 
SET COMMAND(5) TO ROW (1-8) L22-L20 : 
$10-S6 SET COMMAND(2) TO COLUMN (A-H) L15-L20 : 
SET COMMAND(3) TO ROW (1-8) 
$12-S13 SET COMMAND(2) TO COLUMN (A-H) 123-L20 : 
SET COMMAND(3) TO ROW (1-8) 
OTHERS (NO SEMANTICS) OTHERS: 
Table 1: Semantics for the syntactic finite-state Table 2: 
machine. machine. 


SET TOKEN(1) TOO 

SET TOKEN(2) TO6 

SET TOKEN(1) TOO 

SET TOKEN(2) TO 4 

SET TOKEN(1) TOO 

SET TOKEN(2) TO3 

SET TOKEN(1) TOO 

SET TOKEN(2) TO1 

SET TOKEN(1) TOO 

SET TOKEN(2) TO 2 

SET TOKEN(1) TOO 

SET TOKEN(2) TOS 

SET TOKEN(2) TO INPUT CHARACTER 
SET TOKEN(1) TO INPUT CHARACTER 
SET TOKEN(1) TO INPUT CHARACTER 
SET TOKEN(2) TO INPUT CHARACTER 
SET TOKEN(1) TO INPUT CHARACTER 
SET TOKEN(2) TO “C” 

SET TOKEN(1) TO INPUT CHARACTER 
SET TOKEN(2) TO “F” 

(NO SEMANTICS) 


Semantics for the lexical finite-state 
These routines are used to set up the 


array TOKEN. 
CSR LA PS Secceamte tt ORS 2 EE 8 Dy eee PE Re S| SN ES ee re ee el 


tained from the user and stored in a buf- 
fer followed by a blank and the end of 
line character. A routine RCHAR is 
assumed to exist. This reads the next 
character from the input buffer and 
places it into the variable CHAR. 
Because of the way that the program has 
been designed, the flow of the program 
is easy to understand and modifications 
are easy to make if necessary, especially 
if the corresponding finite-state machine 
diagram is available. The program is 
divided into sections corresponding to 
the states in the finite-state machine. 
Each section determines which state- 
transition pointer should be followed 
from the character which is being 
scanned. It then performs the semantics 
associated with this state-transition 
pointer and moves along the path by 
means of the appropriate GO-TO state- 
ment. If during the processing of any 
state, the input character being exam- 
ined does not correspond with any valid 
state transition pointer, the routine sets 
TOKEN to the error code described 
above and returns to its caller. 

Listing 2 shows the routine con- 
structed from the syntactic finite-state 
machine. The structure of this program 
is almost identical to the structure of the 
previous routine. This time each section 
of the program examines the next token 
which has been obtained by a call to 
LEX, performs the appropriate semantics 
for the path to be traversed, and then 
moves to the next defined state. Again, 
if either an invalid token is encountered 


or if the routine LEX returns an error 
code, this routine returns to its caller 
after leaving an error code of 0 in COM- 
MAND. 

Due to the way these routines were 
constructed, a single-error code is 
returned if any error occurs in a com- 
mand, But, because the exact location in 
the state diagram is known whenever an 
error occurs, more descriptive error 
messages can be generated, or fix up ac- 
tion may be performed. If the command: 


MOVE TO A8 


is entered, then the syntactic routine 
would encounter the <END> token 
while processing state $8. Based on the 
present form of the program, the error 
message printed would most likely be 
“INVALID COMMAND SYNTAX — 
ENTER NEW COMMAND” since no at- 
tempt is made to analyze the syntax er- 
ror. 

However, instead of merely returning 
the zero-error code to its caller, the syn- 
tactic routine could return a unique 
code to indicate that the FROM section 
of the command is missing. The calling 
routine could then prompt the user for 
the coordinates of the piece which is to 
be moved. Depending on the extent to 
which this error checking is carried out, 
a very elaborate and easy to use com- 
mand system can be created. 


Other Representations 
The FSM diagrams in figure 3 and 4 


2 2 2 Se-8 2 oie pee bate hee Sw 


je 
2 


Ll: 


Listing 1: Routine constructed for the lexical finite-state machine. 


LEX IS A SUBROUTINE WHICH EXAMINES INPUT 
CHARACTERS UNTIL IT FINDS A VALID TOKEN OR 
AN INPUT ERROR, SUBROUTINE RCHAR READS THE 
NEXT CHARACTER FROM THE INPUT BUFFER INTO 
CHAR. '#’ IS THE END-OF-BUFFER CHARACTER. 

LEX SETS TOKEN (THE TWO ELEMENT ARRAY) TO 


THE FOLLOWING CODES: 

TOKEN(1) TOKEN(2) 
<MOVE> - 0 1 
<CAP> - 0 2 
<TAKE> - 0 3 
<TO> - 0 4 
<FROM> -—- 0 ts] 
<END> - 0 6 
ERROR 0 7 
<POS> - ROW: 1-8 COL: A-Z 
SUBROUTINE; 
TOKEN(1) = 0 
STATE 1 — BEGINNING STATE 
CALL RCHAR( ); 


IF CHAR = ‘' THEN GOTOLI; 
IF CHAR = 'T’ THEN GO TO L3; 
IF CHAR = ‘M’ THEN GO TO L10; 
IF CHAR = 'C’ THEN GO TO L15; 
IF CHAR = ‘F’ THEN GO TO L23; 
IF CHAR = ‘#’ THEN DO; 
TOKEN(2) =6; 
RETURN; 
END; 
IF CHAR = ‘A’ | 'B’| ‘D' | 'E’| 'G'| 
‘H’ THEN DO; 
TOKEN(2) = CHAR; 
GO TO L19; 
END; 
IF CHAR = ‘1'| '2'| ‘3’ | ‘4° | ‘5’ | 
‘6’ | ‘7' | ‘8’ THEN DO; 
TOKEN(1) = CHAR; 
GO TO L232; 
END; 
GO TO LEXERR; 


STATE 3 — HAVE FOUND ‘T’ 
CALL RCHAR( ); 

IF CHAR = 'O' THEN GO TO L4; 
IF CHAR = ‘A’ THEN GO TO L6; 
GO TO LEXERR; 


STATE 4 — HAVE FOUND <TO> 
CALL RCHAR( ); 
IF CHAR = *‘ ' THEN DO; 
TOKEN(2) = 4; 
RETURN; 
END; 
GO TO LEXERR; 


STATE 6 — HAVE FOUND ‘TA’ 
CALL RCHAR( ); 

IF CHAR = ‘K' THEN GO TO L7; 
GO TO LEXERR; 


STATE 7 — HAVE FOUND ‘TAK’ 
CALL RCHAR ( ); 

IF CHAR = ‘E’ THEN GO TO L8; 
GO TO LEXERR; 


see ee 


122: 


123: 


LEXERR: 


END LEX; 


STATE 8 — HAVE FOUND <TAKE> 
CALL RCHAR( ); 
IF CHAR = ‘ ‘ THEN DO; 
TOKEN(2) = 3; 
RETURN; 
END; 
GO TO LEXERR; 


STATES 10 THRU 13 ARE VERY SIMILAR 
TO STATES 3 THRU 8 ABOVE AND ARE 
NOT SHOWN. 


STATE 15 — HAVE FOUND 'C’ 
CALL RCHAR( ); 
IF CHAR = ‘]’| '2'|'3'| ‘4"| 'S"| 
‘6’ | ‘7’ | ‘8 THEN DO; 
TOKEN(1) = CHAR; 
TOKEN(2) = 'C’; 
GO TO L20; 
END; 
IF CHAR = ‘A’ THEN GO TO LI6; 
GO TO LEXERR; 


STATES 16 AND 17 RECOGNIZE THE REST OF 
<CAP> AND ARE NOT SHOWN. 


STATE 19 — HAVE FOUND COLUMN LETTER (A-Z) 
IF CHAR = ‘1’ | '2'| '3'| ‘4’ | '5"| 
‘6’ | ‘7' | '8' THEN DO; 
TOKEN(1) = CHAR; 
GO TO L20; 
END; 
GO TO LEXERR; 


STATE 20 — HAVE FOUND <POS> 
IF CHAR = ‘ ' THEN RETURN; 
GO TO LEXERR; 


STATE 22 — HAVE FOUND ROW NUMBER (1-8) 
IF CHAR = ‘A’ | ‘B' | ‘C’ | ‘D' | 'E’ | 
‘F’ | 'G’ | ‘H’ THEN DO; 
TOKEN(2) = CHAR; 
GO TO L20; 
END; 
GO TO LEXERR; 


STATE 23 — HAVE FOUND ‘F’ 
IF CHAR = ‘1’ | '2’| ‘3’ | ‘4’ | 5" | 
‘6’ | '7' | '8' THEN DO; 
TOKEN(1) = CHAR; 
TOKEN(2) = ‘F’; 
GO TO L20; 
END: 
IF CHAR = ‘R’ THEN GO TO L24; 
GO TO LEXERR; 


STATES 24 THRU 26 ARE SIMILAR TO OTHER 
STATES WHICH RECOGNIZE KEYWORDS AND ARE 
NOT SHOWN. 


LEXERR — AN ERROR HAS BEEN ENCOUNTERED 
IN THE INPUT STRING. 

TOKEN(1) = 0; 

TOKEN(2) = 7; 

RETURN; 


73 


Listing 2: Routine constructed for the syntactical finite-state machine. 


74 


2 


eM tee eee eee eee ne 


n 
a] 


. 


SYN IS A SUBROUTINE WHICH EXAMINES INPUT 
TOKENS TO DETERMINE IF A COMMAND IS OR IS 
NOT VALID. SYN USES SUBROUTINE LEX TO 
OBTAIN THE TOKENS FROM THE INPUT STREAM. 
A FIVE ELEMENT ARRAY NAMED COMMAND IS 
SET USING THE FOLLOWING CODES: 


COMMAND(1) : 0= ERROR, 1 = MOVE,2=CAP,3=TAKE. 


COMMAND(2) : COLUMN (A-H) OF “FROM”. 
COMMAND(3) : ROW (1-8) OF “FROM”. 
COMMAND(4) : COLUMN (A-H) OF “TO”. 
COMMAND(S5) : ROW (1-8) of “TO”. 


SUBROUTINE; 


STATE 1 — BEGINNING STATE 

CALL LEX( ); 

IF TOKEN(1)=0 & TOKEN(2)=1 THEN DO; 
COMMAND(1) = 1; 
GO TO S82; 
END; 

IF TOKEN(1) =0 & TOKEN (2)=2 THEN DO; 
COMMAND(1) = 2; 
GO TO S82; 


END; 
IF TOKEN(1)=0 & TOKEN(2)=3 THEN DO; 
COMMAND(1) = 3; 
GO TO $3; 
END; 
GO TO SYNERR; 


STATE 2- <MOVE>OR<CAP>FOUND 

CALL LEX( ); 

IF TOKEN(1)=0 & TOKEN(2)=5 THEN GO TO S3; 
IF TOKEN(1)=0 & TOKEN(2)=4 THEN GO TO S4; 
GO TO SYNERR; 


STATE 3- <MOVE> <FROM>FOUND 

CALL LEX( ); 

IF TOKEN(1)>0 THEN DO; 
COMMAND(2) = TOKEN(2); 
COMMAND(3) = TOKEN(1); 

GO TO S4; 
END; 
GO TO SYNERR; 


STATE 4— <MOVE><FROM> <POS>FOUND 
CALL LEX( ); 

IF TOKEN(1)=0 & TOKEN(2)=4 THEN GO TO SS; 
GO TO SYNERR; 


STATE 5- <MOVE> <FROM< >POS> <TO>FOUND 


CALL LEX( ); 

IF TOEKN(1)>0 THEN DO; 
COMMAND(4) = TOKEN(2); 
COMMAND(S) = TOKEN(1); 
GO TO S6; 

END; 
GO TO SYNERR; 


STATE 6 - ENTIRE COMMAND FOUND 

CALL LEX( ); 

IF TOKEN(1)=0 & TOKEN(2)=6 THEN RETURN; 
GO TO SYNERR; 


STATES 8 THRU 13 ARE VERY SIMILAR TO STATES 
2 THRU 6 AND ARE NOT SHOWN. 


SYNERR—INVALID COMMAND SYNTAX. 
COMMAND(1) = 0; 
RETURN; 


have been chosen to illustrate the 
techniques of using finite-state 
machines for designing command 
languages and do not represent the only 
way to implement this sample command 
language. An alternate machine that 
performs lexical analysis for the exam- 
ple game is shown in figure 6. In this 
machine all of the commands and 
keywords (MOVE, CAP, TAKE, TO and 
FROM) map into the single token 
<KEYWORD>. Semantic routines 
associated with the paths L1-L6, L1-L7, 
L6-L7, and L7-L7 would be used to save 
the symbols which have already been 
matched. Then when path L7-L8 is 
traversed, the semantics associated with 
this path would include a table-lookup 
routine to identify the command or 
keyword and correctly fill in the TOKEN 
array. 

To illustrate this technique, observe 
how the finite-state machine in figure 6 
would recognize the capture command. 
Starting with state L1, the C would cause 
the traversal of path L1-L6 and would be 
saved to later help identify the token be- 
ing parsed. The A and the P would 
similarly cause the program to move 
along the paths L6-L7 and L7-L7, respec- 
tively, and again these letters would be 
saved by the semantics associated with 
these paths. Finally, the ending blank 
would cause the traversal of path L7-L8. 
At this time, the semantics associated 
with path L7-L8 would examine the 
saved letters, identify the parsed word 
as either a valid token or an invalid 
word, and correctly fill in the TOKEN ar- 
ray with the code for the token or the 
error code. 

Certain advantages exist for both the 
method used in the finite-state machine 
in figure 3 and for this method but as the 
number of keywords increases, this 
method becomes much more efficient in 
terms of memory used. 


Conclusion 


Now you have been shown how finite- 
state machine theory may be applied to 
produce correct and well-structured 
code for command recognizers. | have 
used finite-state machines to produce 
both an information retrieval command 
language and a FORTRAN free-format 
input processor of character strings and 
numbers; and methods similar to these 
shown here have significantly speeded 
up the implementations. The efficiency 
of this method will vary depending on 
the language used to program the pro- 


<KEYWORD> 


Figure 6: An alternate solution for the lexical analysis of the game program. 


cedures and on the programming tech- 
niques. The sample programs previously 
shown were designed with clarity in 
mind and are not the most efficient 
routines which could have been written. 
| would recommend that the lexical FMS 
be coded in assembly language if possi- 
ble since many techniques exist to im- 
prove the performance of character-by- 
character scanning and comparison. Of 
course, both of the routines may be writ- 
ten in any language desired, but because 
of the memory space limitations of most 
small computers, assembly language 
would probably be an asset. As memory 
size increases, however, the advantages 
of assembler tend to decrease. Which- 
ever language is chosen, the finite-state 
machine method of designing a com- 
mand language should produce a 
system that runs correctly after less pro- 


gramming effort, can be more readily 
understood and changed as necessary, 
and can provide a series of error and 
prompting messages that help to make 
the system easier and more enjoyable to 
use. 


REFERENCES 


For examples of the use of finite-state 
machines to identify tokens of a programming 
language read: 


Gries, David. ‘‘The Scanner.’’ Compiler Con- 
struction for Digital Computers. John Wiley and 
Sons, New York, 1971, pages 64 thru 71. 


More information on FSM theory can be found 
in many books, including: 


Gill, A. Introduction to the Theory of Finite State 
Machines. McGraw-Hill, New York 1962. 


75 


> thi 
f thi 
t bi 
pti 


| | is 
ol) tap phils Baharia 3 
un 3 hy r 


es ‘ia 
Spo aOR sans tubes 
cabo liber ith ap 
; Ele ab | us 
iy Sad . . Mee : yfeitida 
Vinson mse ite arian at sya a MARR 
LENSES HABA S64 Po wo SBAi F j Keone oa athe 


ftps 


rf 
: 
Mi fF 
‘ Haag 
: vat heated! 


hs el iat bese 
, } Raney AY Weta Mey ihe 


ne Is ’ 
i HARRAH at te tise 
wh test Ean aes 


i 


Linking and Loading 


Harry Tennant 


There are two processes that add 
enormously to the flexibility of com- 
puting systems. They are linking and 
loading. Loading is the process of bring- 
ing a piece of code in from secondary 
memory (ie: tape, disk, bubble memory) 
and placing it in primary memory, ready 
to run. The most basic type of loader is 
the absolute loader, which can place a 
program in only one location in primary 
memory—the place it was written to 
reside in. In that location, the address 
references in the jumps, calls, moves, 
etc, all refer to the appropriate locations 
in memory. A more flexible and much 
more useful type of loader is the relo- 
cating loader. A relocating loader places 
a piece of code anywhere in memory 
that there is space available to accom- 
modate it. Addresses in the jumps, calls, 
moves, etc, that depend upon the place- 
ment of the code in memory are then ad- 
justed by the relocating loader so that 
they match the current placement of 
code. 

The linking process is used to com- 
bine several pieces of code that have 
been written separately into one piece. 
For example, these pieces could be a 
main procedure and a number of sub- 
routines. The main function of linking is 
to examine each piece of code for 
references to variables and procedures 
that are defined in other pieces, and 
translate these references into absolute 
addresses that must be used in the ad- 
dress fields of instructions. 

Linkers and loaders have existed for 
years on large computers, and are now 
beginning to be seen on microcom- 


puters. The construction, operation, and 
benefits of linkers and loaders will be 
discussed in this article. 


The Use of Linkers and Loaders 


The following situations will help to 
illustrate the benefits of linkers and 
loaders. After the situations, there will 
be a description of ways to implement 
the benefits. The situations described 
demand linkers and loaders of varying 
degrees of complexity. As with other 
programs, the complexity of linking and 
loading routines that are chosen for a 
computer should agree with the re- 
quirements put on the system and the 
available resources, such as memory 
space and type of secondary memory. 

Nearly all computers that are to be 
used for purposes other than simple con- 
trol functions will require storing pro- 
grams and data on some form of secon- 
dary storage. A loader is required to 
accept information from secondary 
storage and place it in primary memory 
(eg: from tape to main memory). The 
simplest kind of loader moves the bytes 
from the tape to memory, making only 
minor changes to the information. In 
particular, no address fields are 
changed. This is an absolute loader. The 
highest degree of intelligence exhibited 
by an absolute loader would probably 
be to receive information in a blocked 
format with a checksum. The absolute 
loader would verify the checksum, flag 
errors, and unblock the data. Absolute 
loaders are present on large computers 
for taking snapshots and reloading. A 


77 


snapshot is a copy of the entire program 
and data space sent to secondary 
memory. Snapshots are taken during the 
course of a long computation for safety 
purposes. If some calamity occurs 
preventing completion of the job, such 
as a hardware malfunction, the user can 
call upon the absolute loader to load 
the last snapshot completed before the 
problem occurred. The job is continued 
from that point, minimizing the amount 


ABSOLUTE 


RELOCATABLE 


BINARY - SYMBOLIC 


GLOBAL 
SYMBOL 
DICTIONARY 


CODE 
SEGMENT 


Figure 1: Forms of machine code. The segments of code used as input 
to linkers and loaders, called object modules are of three varieties. Ab- 
solute code has no unresolved external references and no added infor- 
mation necessary for relocation. Notice that if absolute code is 
written with relative or absolute addressing only, the segment can still 
be relocated. If some addresses will need changing for relocation, they 
must be marked with bit flags or pointers. (Pointers should always be 
made to addresses relative to the origin of the code rather than to ab- 
solute locations.) The binary-symbolic form of object module contains 
information necessary for resolving references to symbols made in 
one segment. All the tables shown here appear to occupy memory ad- 
jacent to the code segment. Of course, there is no reason for this to be 
so. 


78 


of computing that needs to be 
duplicated. 

Often code that has been written for 
one computer will not run on a second 
because the code was originally written 
in an area of memory of the first com- 
puter that does not exist in the second. 
One would like to be able to translate 
the program to an area of memory that 
does exist in the second computer. To 
do this, the address fields of jumps, 
calls, and other instructions will have to 
be altered. This is called relocation. 
Relocating loaders examine a table asso- 
ciated with the code to find the ad- 
dresses that need changing. The ad- 
dresses are changed, then the code is 
executed. 

It can be useful to defer relocation of 
code until it is actually used. For exam- 
ple, say one wanted to run a program 
that took 40 K bytes of memory to run, 
but it must be run on a computer that 
has only 20 K bytes of memory. It can be 
done, and is done all the time on larger 
machines. The program is broken into 
segments of convenient size. Some of 
the segments are loaded into primary 
memory and the rest are left out on 
some relatively fast secondary storage 
medium such as a disk or bubble 
memory. A table is made where each of 
the segments can be found. Each seg- 
ment is relocated so that its addresses 
correspond to its current location. As 
the program is executed the addresses 
are observed (through hardware) for 
references to segments that are not cur- 
rently in primary memory. When one is 
detected, an interrupt occurs that 
exchanges one of the segments that is in 
primary memory with the desired seg- 
ment from secondary memory. The seg- 
ment just loaded must now be relocated 
to its. current position in primary 
memory. Waiting until a segment is 
needed before adjusting its address 
fields is called dynamic relocation. The 
concept of running a program in a 
primary memory space smaller than that 
addressed by the program is called vir- 
tual memory. 

So far, all the situations described 
have been handled by loaders. When 
several subroutines are brought from 
secondary memory into primary mem- 
ory to be connected into a single pro- 
gram, each may be relocated just as 
above. In addition, however, references 
to subroutine names and names of vari- 
ables made in one piece of code that are 
defined in one of the other pieces of 
code must be resolved. That is, they 
must be changed from symbolic refer- 


ences to actual memory-location refer- Hexadecimal 


ences. This is linking. As with relocation, Address 
linking can be performed before execu- ee Call the input routine at location 0020 
tion begins (ie: statically) or when the 0002 
reference is actually encountered during pant Load the AC with the byte at 0060 
execution (ie: dynamically). 0005 
0006 Load H-L with the address 0061 
Forms of Machine Code pee 
. i 3 0009 Add AC to the byte at the location in H-L 

Before describing implementation 000A Call the output routine at 0030 
details, it is in order to point out that pene 
pie i of machine code ‘ple Pik 9000 Compare AC to 0 
ed to above. See figure 1. The simplest 00 " 
and most familiar is absolute code poor He not.b,. continue 
which will not be discussed further. Next 0011 
is relocatable code. This type of code ot If zero, jump to absolute location OODD 
includes information to tell the 0014 


telocating loader which address ref- 
erences are supposed to be adjusted if 
the code is moved. It generally has the INPUT routine defined here 
block of code, written as though it were 
to begin at memory location 0, and a 
table of pointers to the location-depen- OUTPUT routine defined here 
dent addresses. If the code is relocated : 

to begin at location 137 instead of at 0, 
137 is added to each memory location Data byte 1 
pointed to from the relocation table. Data byte 2 

The third kind of code is binary- 


symbolic code. This is like relocatable Figure 2: Absolute version of a sample program. In the absolute ver- 
code with two more tables added on. sion, the main procedure and the two subroutines are written in a 
The first table contains memory loca- block. This code is not relocatable. 


tions associated with all the symbols 
defined in this piece of code which other 
pieces of code can refer to. These 
primarily include subroutine names, 


labeled addresses, and variable names. Relocation Table Relocatable Code 
The second table holds the symbolic Hexadecimal Hexadecimal 
names referred to in this piece of code PORES: PEORSOE aut is Seouee ee 
which are defined in other pieces. 
Associated with the names are the loca- e002 
tions where the references are made. 
0005 
Relocating Loaders eope 
0008 
There are several methods of telling 0009 
the relocating loader which memory conn 
references need relocation. The first o00c 
method is to set a particular bit in the 000D 
address field, tagging a relocatable OBE 
address. This can be used in a microcom- 0010 
puter by using the most significant bit of rocetdl 
addresses. A 1 would indicate a relo- 0013 
catable address. Addresses that are 0014 
absolute are left unchanged by the relo- 
cating loader and given a 0 in the most Figure 3: Relocatable code. The sample program has been written with 
significant address bit. The relocator the origin at 0000. The relocation table is located in a distant area of 
' would scan the code for instructions memory. It is composed of a byte describing the number of bytes in 
with address-field operands, then check the table and double byte address pointers. If the code is to be 
the relocation bit. relocated to location 0137, then the addresses are selected one at a 
This presents a few problems. First, time from the relocation table. 0137 is added to the address to find the 
each byte needs to be examined, and new absolute address that needs relocation. The double byte address 
every instruction type must be held in a pointed to by this sum is now added to the relocation base, 0137, and 
table to record whether it is a 1-, 2-, or this sum is substituted into the program. This process is repeated for 
3-byte instruction. Second, this method every address in the relocation table. 


79 


cannot be used if the code is to be 
moved at anytime during its execution 
because the relocation information, the 
marker bit, has been destroyed. This 
makes dynamic relocation impossible. 

Dedicating an address bit to reloca- 
tion does not necessarily cut the poten- 
tial address space (eg: 16 bits or 64 K 
bytes) in half (eg:to 15 bits or 32 K bytes). 
It only implies that all absolute 
references must be to locations in the 
lower 32 K memory locations of primary 
memory. The code can still be placed 
anywhere in the 64 K address space, ad- 
justing the relocatable addresses as 
necessary. (The absolute addresses 
could all be contained in the upper 32 K 
instead of the lower 32 K by switching 
the marking convention to 0 for a 
relocatable address and 1 for an ab- 
solute address. This convention may be 
more appealing in systems where 
monitor subroutines and input/output 
(I/O) devices are located in the high ad- 
dresses of memory.) 

A second and more common method 
for relocation is to make a table of the 
addresses that need to be relocated, and 
append the table to the code (see figure 
3). The code would be written as though 
it started at location 0. All the memory 
addressed for internal jumps and calls, 
etc, would be relative to this origin. A 
minimal relocation table would consist 
of a list of pointers that are relative to 
the origin of the code to the addresses 
needing relocation. If the code is placed 
in memory so that its origin is at location 
137, then the relocation table is used to 
add 137 to every address field pointed to 
from the relocation table. If the reloca- 
tion table and the address of the origin 
are saved, the segment of code can be 
relocated during execution by dynamic 
relocation. This is necessary for virtual- 
memory systems. If dynamic relocation 
is not desired, the relocation table can 
be eliminated after the first loading. 
Notice also that absolute memory 
references in the code are unaffected by 
this relocation method. They simply are 
not pointed to from the relocation table. 
Relocation tables have the advantage of 
requiring the relocating loader to deal 
only with those address fields that ac- 
tually require relocation. 

In some circumstances it is desirable 
to relocate relative to more than one 
base. For example, when developing 
software for a microcomputer system, 
programs can be located in the read- 
only memory area of memory while data 
is placed in programmable memory. A 
separate relocation base can be used for 


the two areas. In the relocation-table 
method, multiple relocation bases can 
be accommodated by adding another 
field to each table entry, as in figure 4. 
This field designates the relocation base 
to be used. 

The information held in the relocation 
table can also be expressed by storing 
the code in variable sized blocks. Infor- 
mation associated with each block spec- 
ifies whether the instructions in the 
block need relocation, and if so, by what 
relocation base. 

Two observations have been made 
concerning memory references in 
machine code. First, the majority of 
references are the kind that require 
relocation. There are usually many more 
relocatable references than absolute 
references. Second, many of the relo- 
catable addresses refer to locations that 
are relatively close to where the jump or 
call is. Hardware features have been 
built into many larger computers and 
some microcomputers to take advan- 
tage of these observations. 

Because there are more relocatable 
references than absolute references in a 
piece of code, it would be more efficient 
if the relocating loader could manipu- 
late the absolute addresses and leave 
the relocatable addresses alone. This is 
accomplished on many computers by 
having a relocation register that is 
always added to address references. 
Consider again the code that was written 
to start at location 0, but is being 
relocated to start at location 137. 
Instead of changing the relocatable 
addresses, 137 is put into the relocation 
register, which is always added to 
address references. In this way, all 
relocatable addresses are displaced to 
the appropriate memory location 137 
locations higher in memory than the 
code was originally written for. If there 
is a reference in the program to, say, 
absolute location 70, it is relocated 137 
locations. To make this reference refer 
properly to absolute memory location 
70, in spite of the fact that it will be add- 
ed to the relocation register (containing 
the value 137), the relocating loader 
adds —137 to the address field. This 
cancels the effect of the relocation 
register, thereby allowing the reference 
to indicate absolute location 70 as it 
should. Relocation on a computer with 
relocation registers is much simpler than 
on other computers. The relocation 
table need only point to the absolute 
memory references, making it a much 
smaller table than it would be otherwise; 
an example of this is in figure 5. An 


instruction is generated, perhaps in the 
segment of code or perhaps in the oper- 
ating system, to set the relocation 
register before entering the code seg- 
ment. 

When relocation registers are used, 
they must be altered or disabled when 
leaving the code segment. (See figure 6 
for a block diagram of a relocation 
register.) For instance, if an interrupt 
occurs, and an RST instruction is gener- 
ated, the relocation register must not be 
added to the address field of the gen- 
erated call. This can be done by dis- 
abling the relocation-register circuit 
with the interrupt-acknowledge signal. 
The contents of the relocation register 
could be pushed onto the stack and 
changed while the interrupt is pro- 
cessed. It must then be recovered before 
the interrupt routine is returned from. 
An instruction to reactivate use of the 
relocation register would have to be 
included in the interrupt subroutine, but 
not take effect until the return instruc- 
tion has been read. Otherwise, the fetch 
for the return instruction would be dis- 
placed by the relocation base. If the 
return is preceded by an interrupt 
enable, reactivation of the automatic 
indexing circuit could be dependent 
upon interrupts being enabled. This way, 
indexing is not resumed until after the 
return has been processed. It would also 
be necessary to disable the relocation 
register on input/output (I/O) instruc- 
tions. 

In addition to relocation registers, the 
relocation table can be greatly reduced 
by using relative addressing. Relative 
addressing allows references to memory 
locations that are a specified distance 
from the current location. On the 6800 
microcomputer, locations up to 125 
bytes before or 129 bytes after the cur- 
rent contents of the program counter 
can be referenced. On the 2650, loca- 
tions up to 64 bytes before and 63 bytes 
after the current contents of the pro- 
gram counter can be referenced. (This 
accounts for 7 bits, one sign and six for 
the number of bytes. The eighth bit is 
used for indirect referencing, which is 
useful for linking discussed below.) If 
code can be written using only these 
relative memory reference and absolute 
references, it can be relocated without 
relocation registers and without a relo- 
cation table of any kind. This is called 
position-independent code. Relative 
addressing works regardless of its posi- 
tion in memory. This is useful for 
dynamic relocation as well. 

The concept of relocation registers 


Relocation Table Relocatable Code 


Hexadecimal Hexadecimal 
Address Contents Address Contents 

FOOO OF. 0000 CALL 
FOO1 o7 0001 20 
FOO2 00 0002 00 
FOO3 QQ/ refers to address base 0003 LDA 
F004 04 0004 00 
FOO5 09 0005 00 
FOO6 01 refers to address base OO06 LXI 


FOO7 97) _______—> 0007 00 
FOO8 0 0008 00 


FOO9 01 refers to address base 0009 ADC 
FOOA OB OOOA CALL 
FOOB 00; 000B 30 
FOOC OQ refers to address base OOOC 00 
FOOD 10 OOOD- CPI 
FOOE 00, QOOE 00 
FOOF 00 refers to address hase OOOF JNZ 
x 0010 00 
0011 00 
0012. JMP 
0013 DD 


0014 00 


Figure 4: Relocatable code with multiple relocation bases. The use of 
more than one relocation base enables the loader to put different 
parts of the code into different areas of memory. Two bases are used 
in this figure, one for data and the other for the program. The third 
byte in each relocation table entry specifies which relocation base to 
use. Notice that the code would not run at location 0000 without 
relocation. The references to the data bytes at 0004 and 0007 need to 
be relocated out of the program area prior to execution. 


Relocation Table Relocatable Code 
Hexadecimal Hexademical 
Address Contents Address Contents 
FOOO 02 0000 
FOO1 13, 0001 
FOO2 00 0002 
0003 
0004 
0005 
0006 
0007 
0008 
0009 
000A 
000B 
o00c 
000D 
OOOE 
OOOF 
0010 
0011 
0012 
0013 
0014 


Figure 5: Relocation table required with automatic indexing by a 
relocation register. Automatic indexing through a relocation register 
dramatically reduces the size of the relocation table. The contents of 
the relocation register are automatically added to all addresses. The 
table only references absolute addresses and is used to subtract the 
contents of the relocation register from the altered address, giving the 
absolute address. 


81 


can be extended to, implement virtual- 
memory systems. Recall that virtual 
memory allows the user to address more 
memory space than there is primary 
memory in the computer. The entire pro- 
gram is stored in secondary memory, 
divided into regular-sized chunks of be- 
tween, say, 512 bytes to 4 K bytes each. 
The chunks, called pages, that are cur- 
rently in use are placed in primary 
memory. For example, the page size of a 
particular computer is 1024 bytes. The 6 
most significant bits of an address 
would refer to the page of the memory 
reference, and the 10 least significant 
bits refer to the address within the page. 
As with automatic indexing, a circuit is 
built to operate on the addresses as they 
are generated by the processor. Instead 
of adding a register value, however, the 
most significant 6 bits of the address are 
used to address a small special memory, 
as in figures 7 and 8. The actual current 
locations for all the pages in the pro- 


PROCESSOR 


INTERRUPT ACKNOWLEDGE 


ADDRESS 
BUS INTERRUPT ENABLE 
8 
170 PORT 


16 BIT REGISTER 


16 BIT ADDER 
SELECTION 
IT MULTIPLEXER 
16.8 iw exe Logic 


ADDRESS 
BUS 


Figure 6: Relocation register block diagram. In automatic indexing, a 
16-bit relocation base is almost always added to the address lines 
before going onto the address bus. The only exception is when an in- 
terrupt occurs. The interrupt-acknowledge signal selects the non- 
relocated address to be passed through the multiplexer onto the bus. 
When the interrupt service routine has been completed and interrupts 
have been reenabled, the input to the address bus reverts back to the 
relocated addresses. Automatic relocation is also disabled for use of 
I/O instructions. 


82 


gram are stored in the special memory 
called page memory. If the page ad- 
dressed is currently in primary memory, 
the 6 bits specifying the actual location 
of the 1024-byte page come out of the 
page memory onto the address bus. If 
the page addressed is not currently in 
primary memory, this reference causes 
an interrupt (perhaps by a 1 in the 
seventh bit of the page-memory output). 
An interrupt routine is called to locate 
the desired page in secondary memory, 
to select a relatively unused page that is 
in primary memory and send it out to 
secondary memory, to read in the 
desired page and to update the contents 
of the page memory to reflect the 
change. The memory reference that 
caused the interrupt (called a page fault) 
is then allowed to proceed. 


Linking 


The preceding section has dealt with 
loading code from secondary storage 
into primary storage, and loading with 
relocation. The process of linking pieces 
of code together is invariably discussed 
whenever loading is discussed, and vice 
versa. The two processes are often com- 
bined into one program called a linking 
loader. A linking loader takes several 
procedures, resolves the references of 
each procedure that is defined in one of 
the other procedures (linking), and loads 
and relocates all the pieces into primary 
memory. 

The process can be executed in two 
passes like a two-pass assembler. In the 
first pass a segment of code is loaded 
and relocated. A global symbol table is 
compiled from all the symbols used or 
defined in the current segment of code 
but which are defined or used in the 
other segments. Then the next segment 
is loaded and the global symbol table is 
added to. This is continued until all 
segments are loaded. In the second pass, 
the global symbols that are referenced 
in each segment, but defined elsewhere, 
are resolved using the global symbol 
table. This consists of substituting ap- 
propriate locations into the address 
fields of calls, jumps, and data-area 
references. 

Linking loaders are quite common, 
but there is no reason that linking must 
be done while loading. Linking can be 
done as early as prior to assembly or as 
late as at execution time (ie: dynamic 
linking). A linkage editor is a program 
that only performs linking. A loader 
must be used to load link-edited code at 
a later time. 


MOST 
SIGNIFICANT 


MEMORY 
64 BY7 


PROCESSOR 


PAGE FAULT nee) ADDRESS BUS TO SYSTEM 


INTERRUPT BIT 


LEAST 
SIGNIFICANT 
BIT 


Figure 7: Page memory. In this figure, the page memory is shown as a 64 by 7 bit memory. The six address bits to the 
memory are ties to the six most significant bits of the CPU address bus outputs. Six of the seven output bits of the 
page memory become the most significant bits of the address sent out on the address bus. The seventh output bit 
designates whether or not the page specified by the CPU is currently loaded in primary memory. If not, an interrupt is 
generated to bring the desired page in from secondary memory. 


PRIMARY 
MEMORY 


LOGICAL 
PAGE PAGE 
NUMBER MEMORY 


00 


PAGE 
NUMBERS 
FROM THE 
PROCESSOR 


PHYSICAL 
PAGE IN 
MEMORY 


Figure 8: Physical and logical pages. This 
more detailed look at the sample con- 
tents of page memory illustrates the dif- 
ference between physical and logical ad- 
dresses. The processor outputs a page 
number, say page 07. This is the logical 
page number. To the processor it appears 
as though it is addressing the 1024 bytes 
starting at location 0001110000000000. 
The page memory converts the logical 
page address to a physical page address. 
The physical page corresponding to 
logical page 07 is the 1024 bytes starting 
at location 0000010000000000 (physical 
page 01). The logical pages that are 
physically in secondary memory, such as 
logical pages 01,02,06,7A,7B, etc, will 
cause an interrupt if addressed. 


83 


Hexadecimal 


Global Symbol Dictionary Address Contents Subroutines 
EES 
Symbol Definition Table OEE4 


OEES 


Symbol Use Table OEE6 
INPUT oes” "OF? 
OUTPUT OEE6: OEES 


0000 
0001 

Relocation Table 0002 CALL OFE3 
0004 0003 


0007. 0004 
0010. 0005 
0006 
0007 


0008 
0009 
000A 
000B 
oo00c CALL OEE6 
000D 
OOOE 
OOOF 
0010 
0011 
0012 
0013 
0014 


* address filled in at linking time 


Figure 9: Transfer vector linking. In the sample program the two 
subroutines called have been assumed to be internal to the code seg- 
ment. From this point on, they are assigned the symbolic names INPUT 
and OUTPUT, and are assumed to have been defined externally (in 
another object module). The sample code segment defines no symbols 
for use by other object modules, so the symbol definition table is emp- 
ty. Here we see preparations made for linking calls to the routines IN- 
PUT and OUTPUT. Jump instructions have been generated by the 
linker and placed in memory. All the calls to INPUT, as well as to 
OUTPUT, call the same jump instruction. The linker will combine the 
symbol definition tables from all object modules to form one master 
table. When this has been done, the address of the first location of the 
INPUT routine is placed in the address field of the INPUT jump in- 
struction (hexadecimal OEE4 and OEE5). Similarly for OUTPUT (hex- 
adecimal OEE7, OEE8). Notice that the address fields of the call instruc- 
tions need to be relocated only if the jumps are relocated. 


84 


In order to link code segments, the 
code must be in binary-symbolic form. 
This form includes relocation informa- 
tion and a global-symbol dictionary. 
This dictionary contains two tables. The 
first is a table of the symbolic references 
(ie: names) to subroutines, data loca- 
tions, etc, that are defined in other 
segments of code. This table, the 
symbol-use table, consists of the sym- 
bolic name and an identification of the 
location in the code where the symbol is 
referenced (eg: the address field of a call 
instruction). The other table, the symbol- 
definition table, is a listing of all the 
global symbols that are defined in each 
code segment and the pointers to the 
locations where the symbols are defined 
(eg: pointers to the beginning of 
subroutines, data areas, other blocks of 
code). Four methods of building symbol- 
use tables will be discussed. 

The first method of symbol-use table 
building to resolve references to sym- 
bols defined outside of the code seg- 
ment is the transfer-vector technique. 
(See figure 9.) A jump instruction is ap- 
pended to the code for each external 
symbol. All calls in the code segment to, 
say, the function SQRT, defined in a dif- 
ferent segment, are calls to the location 
of the appended jump to the SQRT func- 
tion starting address. (The SQRT routine 
ends with a return-from-subroutine in- 
struction, which returns program control 
to the code following the call.) Until the 
segments are linked, the address field of 
the jump instruction is undefined. The 
linking process places the starting 
address of the SQRT subroutine in the 
jump address field. If there are many 
calls to SQRT in the segment they all 
call the same jump instruction. Notice 
that the address field of the call is an 
address requiring relocation unless it is a 
relative address. The 6800 and 2650 both 
can accommodate relative addressing, 
but the 2650 has the additional advan- 
tage of indirection. Instead of calling a 
jump instruction, the 2650 can use an 
indirect call. The indirect call references 
a memory location that contains the 
address of the subroutine. This saves 
entering the jump command. The trans- 
fer-vector technique works well for calls 
and jumps, but it cannot work for refer- 
ences to data. 

A second technique of resolving refer- 
ences to symbols external to a code seg- 
ment is chaining. (See figure 6.) The 
symbol-use table consists of the symbol 
name and a pointer to the first occur- 
rence in the segment where the refer- 
ence is made. If there is more than one 


reference to the same external symbol in 
the segment, the address field of the 
table entry points to the first reference, 
the address field of the first reference 
points to the second reference, and so 
on. The address field of the last 
reference contains a special address, 
such as hexadecimal FFFF, signifying the 
end of the chain. When the segment is 
linked to other segments, the linker 
starts at the symbol-use table and goes 
from address field to address field. At 
each address field the actual location of 
the external memory reference is 
inserted. Chaining can be used for all 
types of memory references: calls, data, 
or others. Chaining is therefore more 
general than transfer vectors, but it is 
still limited to static linking. Notice that 
in both technqiues, the links to the jump 
instruction (for the transfer vector) or to 
the next address field (for the chain) are 
lost after linking. There is no way to 
relink the segment dynamically if this 
were required. 

The third technique combines the 
flexibility of chaining with the capability 
for dynamic linking. This method has a 
list of address fields in the external 
reference table associated with each 
symbol. (See figure 11.) Each address 
field in the list associated with a symbol 
points to a location in the code segment 
where that symbol is referred to. When 
the linker operates on this table, it finds 
the current location that the symbol 
refers to, then goes down the address list 
of the symbol substituting the proper ad- 
dress for each reference to the symbol in 
the code segment. The advantage of 
keeping a list in the symbol-use table is, 
of course, that it need not be destroyed 
after linking. A code segment can then 
be relinked if there is a need to do so. 

The last technique is similiar to sym- 
bol and list construction. Instead of 
having a list of pointers for each time a 
symbol is used, a different symbol and 
pointer entry is made for each use. (See 
figure 12.) Therefore, if the segment con- 
tains four calls to SQRT, the use table 
will include four SQRT entries, each 
with one pointer. 

Dynamic linking makes the linker 
more complex. It also requires that the 
symbolic form of global-symbols must 
be retained in the global symbol dic- 
tionary. Dynamic linking is required for 
providing maximum flexibility for 
memory allocations. 

The virtual-memory description given 
earlier in the article assumed a program 
that had already been linked. However, 
virtual memory can benefit from 


SYMBOL USE TABLE ADDRESS CONTENTS 


HEX ADDRESS 
SYMBOL NAME WHERE USED 


INPUT 00Al1 ——————— 00Al1 
SQRT OOA9 
OUTPUT 00cs | 
OO0A9 
OPERAND1L OOAE — 
ae 


Figure 10: Chained references to global (external) symbols. The four 
references to SQRT are chained together. The last reference in the 
chain is signified by hexadecimal FFFF in the address field. Unlike vec- 
tor linking, chaining may refer to more than jumps and subroutine 
calls. The symbol OPERAND1 refers to a data byte address. This is the 
advantage of chaining over vector transfer. 


85 


SYMBOL USE TABLE ADDRESS CONTENTS 


HEX ADDRESS 
SYMBOL NAME WHERE USED 


INPUT 00Al ———>. 00A1 


SQRT 00A9,00B5,00BF,00CA 


OUTPUT 00D8 
OOAS 


OPERAND1 OOAE —— ie ie 
—> 00AE 


0000 


Figure 11: Multiple references listed. This method has the advantage 
of chaining, but is nondestructive. This allows relinking if desired. 
Also, a reference can be made to an address relative to a defined sym- 
bol by adding the relocation constant to the contents of the address 
pointed to. Here, the address at hexadecimal OOAF refers to a location 
1B bytes beyond the location of OPERAND1. 


86 


dynamic relocation, with dynamic link- 
ing at execution time. Execution is 
begun before any linking is done. The 
address fields of instructions with 
unresolved references are tagged so that 
they will cause an interrupt. For exam- 
ple, assume that all the unresolved ad- 
dress fields are filled with all ones. 
When a reference is made during execu- 
tion to hexadecimal FFFF, an interrupt is 
generated. The interrupt service routine 
then performs the linking for the 
referenced symbol, bringing in pages 
from secondary memory, if necessary. 
This way, linking to a symbolic reference 
is put off until the first actual reference 
is made during execution. When the first 
reference is made, all references to the 
symbol are linked. 

At the other extreme from dynamic 
linking is linkage editing. A linkage 
editor, as mentioned earlier, is not con- 
cerned with relocation and loading prior 
to the execution of code. The linkage 
editor takes several code segments in 
binary-symbolic form, and relocates and 
links them into one large module (ie: a 
load module). The addresses of the en- 
tire load module are written with an 
origin at location 0. All references to 
global symbols are resolved. The reloca- 
tion information is retained within the 
load module. The entire load module 
can now be relocated to be executed, or 
it can be sent to secondary storage for 
use at a later time. The word editor in 
linkage editor points out the fact that 
after linking has been completed, the 
user has an opportunity to make 
changes in the load module prior to exe- 
cution, if desired. 

Linking symbolic references between 
code segments has been described as if 
all code segments are presented to the 
linker at every linking operation. This is 
not always the case. A collection of 
segments may always be assumed avail- 
able to the linker. This collection is the 
system subroutine library. If the linker 
cannot resolve a symbolic reference us- 
ing the global symbols defined in the 
code segments that are currently being 
linked, it searches for the symbol in the 
global-symbol table of the system 
library. 


Conclusion 


Linking and relocatable loading pro- 
vide a computer system with con- 
siderable flexibility. Relocatable loading 
permits a segment of code to be exe- 
cuted in any area in primary memory 
that is available at load time. This gives 


much more freedom than insisting that 
code be executed in the locations it was 
written in. Linking of code segments 
encourages the development of modu- 
lar software and the reuse of code 
segments in different contexts. B 


BIBLIOGRAPHY 


1. Barron, D. W. Assemblers and Loaders. New 
York: American Elsevier, 1969. 


2. Gear, C. W. Computer Organization and Pro- 
gramming. New York: McGraw-Hill, 1974. 


3. Organick, E. |. The Multics System: An Exa- 
mination of its Structure. Cambridge, Mass: 
MIT Press, 1972. 


4. Presser, L. and J. R. White. “‘Linkers and 
Loaders’ in Computing Surveys, vol. 4, 
September, 1972, pp. 149-167. 


5. Shaw, A. C. The Logical Design of Operating 
Systems. Englewood Cliffs, N.J.: Prentice-Hall, 
1974, 


Figure 12: Multiple entries in symbol use 
table for multiple references. An alter- 
native way of making entries in the sym- 
bol use table is to make a separate entry 
for each external reference. The code in 
this figure makes four references to 
SQRT. 


SYMBOL USE TABLE ADDRESS CONTENTS 


SYMBOL NAME 


INPUT 
SQRT 
OPERAND1 
SQRT 
SQRT 
SQRT 
OUTPUT 


HEX ADDRESS 
WHERE USED 


00A1 —————> 00A1 
OO0A9 
OOAE 
00B5 
OOBF 
OOCA 
00D8 


87 


EX 
Fi vs \ uonalaaiees, 


Data Handling 


About This 
Section 


One of the main elements of any pro- 
gram is the data being operated on by 
the program. Data can take many forms, 
and can change form depending on 
many factors. The data can represent 
pure numbers, alphabetic characters, or 
even both simultaneously. It all depends 
on how you look at what is encoded in 
memory, on tape, on disk, and so forth. 
And that is entirely the point of this sec- 
tion — how the programmer can use his 
or her perception of data as an advan- 
tage to simplify the programming task. 

The topics covered in this section 
make both the task of manipulating the 


data and storing the data easier and 
more efficient. This includes articles on 
hashing functions, tables, and sorting. 
One article will even help you avoid 
serious errors while passing data back 
and forth to subroutines. 

The arrangement of data on tape or 
disk is always important, leading to 
decreased access speed or an increase in 
the efficiency of the use of space if done 
well. When unlimited time or space is 
available, these topics are moot; but in 
most microcomputer systems, they must 
be dealt with on a daily basis. 


Sorting It Out 


Have you ever noticed that as you get 
further into programming you have use 
for a basic group of routines fairly fre- 
quently? Some examples are: input/ 
output (I/O), table searching, and 
American Standard Code for Informa- 
tion Interchange (ASCII) digit-to-binary 
conversion. You can think of these 
routines as primitive software functions 
since they are used by many different 
types of software. An analogy would be 
the relationship between the compiler 
(be it FORTRAN or Pascal) and the 
machine instructions in your computer. 
The machine instructions are used by 
any compiler, so in this sense they are 
primitives. This article talks about 
another type of software primitive, sort- 
ing. Rearranging a card hand, entering 
names in a black book, and putting club 
members’ names in alphabetical order 
are all examples of sorting. The basic 
idea is to use some information asso- 
ciated with the items you are sorting to 
arrange the items by increasing or 
decreasing value. In general, each item 
you sort has two elements associated 
with it, a key and data. The key is the 
part of the item that is used for com- 
parison to see which item gets placed 
before another, while the data asso- 
ciated with the key tags along with it. 
Often the key and the data are the same, 
such as cards in a bridge hand. Other 
times they are two distinct creatures, 
such as the list of entries in a telephone 
directory. In the case of the directory, 
the name is the key while the address 
and number are the data. 


Brian D Murphy 


The idea of this article is not to give 
you a thorough understanding of the 
workings of the various sorting tech- 
niques, but to provide an easy-to-use 
source to go to if you have a program to 
write that requires sorting. There are 
dozens of sorting schemes, each having 
their own points of interest and usage, 
and if you really want to get down to the 
details, | have listed three excellent 
sources at the end of this article for that 
purpose. 

Included are four techniques to 
choose from when trying to decide 
which method is best suited to your 
particular application. Consult table 1 
for the characteristics you desire, and 
locate those software characteristics 
most important to you. 


Table Size <20 Table Size >20 


Insertion 
Sort 


es ee 
[stein son seven | sewer [ tm | 
es A 


Table 1: Comparison of four different sorting techniques. To find the 
optimal sort for your purpose, it is sometimes necessary to have a trade- 
off between execution speed and the amount of memory used. *The 
counting method is best for speed if less than sixty items are involved 
and the object of the sort is a ranking of entries. 


93 


BINARY 
INSERT 


MOVE V(I-1) 
TO VIC) UP TO 
Vil) TO vic+!) 


100 
llo 
120 
130 
140 
150 
160 
170 
180 
190 
200 
210 
wo 220 
230 
250 
260 
270 
280 


FOR I=2 TON 
A=1 

B=I 

IF A<=# THEN 200 

C=INT( (A+B) /2) 

IF V(C)>= V(I) THEN 180 
BrC-1 

GOTO 130 

ARC+h 

GoTo 130 

C=A 

IF C>- I THEN 270 
Dev(I) 

FOR J= I TO C+l STEP -2 
V(a)=V(J-1) 

NEXT J 

vic)=p 

NEXT I 

RETURN 


If the items to be sorted are viewed as a 
row of N cards laid out on the table, 
binary insertion is used to sort them in 
the following manner. Moving from left 
to right, each card is compared to the 
ones preceding it to see which two it fits 
between. When the correct position is 
found, all cards to the right of the posi- 
tion are moved right one position, and 
the card is placed in the proper location. 


Figure 1: Flowchart of Binary Insertion with a BASIC program. 


SELECTION 


NO 


RETURN 


Using the idea of the table as a row of N 
cards, selection sorting works as follows. 
Moving from left to right, each card is 
compared to all those to the right of it 
and is swapped with the card that is 
largest of those cards found to be larger 
than it. 


For!=1 TON-1 
A=l 

FOR J=I+1 TON 
IF V(J)< = V(A) THEN 150 
A=J 

NEXT J 

T=VIA) 

ViA) =Vi(I) 

Vl) =T 

NEXT I 

RETURN 


Figure 2: Flowchart of selection sort with 
a BASIC program. 


As mentioned in the introduction, counting does not rear- 
range a table of items, but generates a corresponding table 
that gives, for each item, its relative standing in its table (ie: 
how many items are greater or less than it). Suppose you 
have a list of participants and their scores from a golf tourna- 
ment and you wish to find the standing of each player. The 
counting method would be used as follows. Going down the 
list from top to bottom, each player’s score is compared to 
all those below his on the list. If his score is greater than 
someone else’s score, a tally mark is entered next to the 
other player’s name on the list. If his score was the same or 
less, he gets the tally mark. After you have moved all the way 
down the paper, you count each player’s tally marks. A 
player with fewer marks next to his or her name has a higher 
standing than another who has more. 


100 FOR |=1 TON-1 

110 FOR J=I+1 TON 

113REM 
114REMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 
115REMMMM IF THE LOWEST KEY IS TO RECEIVE THE STANDING OF 1 
116REMMMM USE A > TEST NEXT RATHER THAN A < 
117REMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 
118REM 

120 IF V(1) < V(J) THEN 150 

130 S(J)=S(J) +1 

140 GOTO 160 

150 S(I) =S(l) +1 

160 NEXT J 

170 NEXT I 

180 RETURN 


Figure 3: Flowchart of counting with a BASIC 
program. 


95 


96 


A:INTEGER OF 
N/2 +1 
BEN 


A=INT(N/2) +1 
B=N 

IF A< =1 THEN 160 
A=A-1 

C=V(A) 

GOTO 220 

C=V(B) 

V(B) =V(1) 

B=B-1 

IF B<>1 THEN 220 
ViIy=C 

RETURN 

D=A 

=D 

D=2*D 

IF D=B THEN 290 
IF D>B THEN 320 
IF V(D)> =V(D +1) THEN 290 
D=D+1 

IF C> =V(D) THEN 320 
T=V(D) 

V(D) =Vil) 

V()=T 

GOTO 230 

Vil)=C 

GOTO 120 


Deviating from the policy of trying to give an idea of what is happening, | will 
not try to explain how the Heap Sort works. It is similar to the process of 
eliminating players in a tournament: playing the winners of matches against 
each other. If you really want to see how it works, read the description of it in 
the book by Mark Elson listed in the references, It is not that difficult to 
understand, but it takes a fair amount of room to explain. This method is one 
of the fastest for larger lists of items. 


Figure 4: Flowchart of Heap Sort with a BASIC program. 


Two categories of methods are listed. 
The first category, represented by binary 
insertion (figure 1), selection (figure 2), 
and heap sorting (figure 4), actually rear- 
range a list or table of items. The other 
category, represented by counting 
(figure 3), generates a second list or 
table containing the standings of each 
item in the original table. An example of 
counting is a professor checking an 
alphabetical list of exam grades to 
determine the class standing of each stu- 
dent. As shown in table 1, if relative 
standing is the goal of the sorting, and if 
there are fewer than sixty items to be 
examined, counting is the fastest tech- 
nique. If a table must be rearranged to 
print out a membership list for example, 
then the first category of techniques 
must be used. 

Before getting into the methods of 
sorting, you’ should note that each 
method is defined by a flowchart and a 
BASIC program. BASIC was chosen 
because it is the most prevalent high- 
level language in personal computers at 
this time. The flowcharts and programs 
assume that the key and data for each 
item are the same (as in the card hand). 
If they are separate, as in the case of the 
telephone directory, a second table con- 
taining the data must be kept. In this 
case the two tables (ie: key and data) 
must be thought of as tied together. If 


keys swap position during sorting in the 
key table, the corresponding operation 
must be performed on the data table. To 
use a method chosen from table 1, you 
can either code in assembly language 
from the flowchart or the BASIC listing, 
or, if you have BASIC on your system, 
you can use the program as it stands. 
The sorting programs are presented as 
subroutines that sort the keys stored in 
the array VI). If you relate to assembly 
language rather than BASIC, V(I) refers 
to a table of keys stored in consecutive 
memory locations. The associated data, 
if any, is stored in a corresponding table. 
When sorting occurs the keys are often 
swapped. If this happens, the corre- 
sponding data entries must also be 
swapped. In binary insertion, movement 
of entries, rather than swapping, occurs, 
in which case the data entires must be 
moved to correspond to the key move- 
ment.@ 


REFERENCES 


Donovan, J J. Systems Programming. McGraw- 
Hill, 1972. 


Elson, Mark. Data Structures. Science Research 
Associates, 1975. 


Knuth, Donald E. The Art of Computer Program- 
ming, Volume 3: Sorting and Searching. Addison 
Wesley, 1973. 


97 


Computer Information 
Arrangement 


David Holladay 


An examination of the small-system 
computer field might lead the observer 
to take a limited view of the potential 
uses of small systems. This is unfor- 
tunate, because a computer, even a 
small one, can do more than play games 
or make lights blink. 

One general application of computers 
is the information retrieval system. A 
classic goal of information retrieval is 
the construction of a system that ab- 
sorbs the contents of books and can 
answer questions concerning the infor- 
mation contained in them. This goal has 
been unapproachable in even the largest 
of computer systems. The best approach 
is to put the burden of intelligence on 
the user’s shoulders and make use of the 
computer's bookkeeping ability. This 
reduces the program to a large-scale 
sorting system tailored to a microcom- 
puter’s capabilities. 

Small systems have limitations in 
memory size, data transfer rates and 
throughput. To cope with these limita- 
tions, | propose a mass information 
handling system called the Computer In- 
formation Arrangement, or CIA. The 
basic hardware required for this system 
includes a processor, 8 to 16 K bytes of 
programmable memory, keyboard, 
video interface and several cassette in- 
terfaces with a data rate of at least 300 
bps. One cassette drive has to be con- 
trollable by the computer in a manner 
beyond that of simple motor control. 


The main storage memory for the 
huge data base is magnetic tape. Tape is 
slow and serial, meaning that only the 
information physically located near the 
tape head can be dealt with — however, 
it is inexpensive. For the moment, our 
data base will be a dictionary (ie: a list of 
definitions sorted alphabetically by 
keyword). If the dictionary is closely 
packed on the tape, it will be difficult to 
add to it without shifting half of the data 
base. It would be more logical to spread 
out the entries on the whole tape to 
avoid future space problems. Unless the 
tape is getting full, the proper position 
of an entry is solely a keyword function. 

If entries are to be added in an effi- 
cient manner, close attention must be 
paid to differing data rates. A human 
can type 2 to 5 characters (ie: bytes) per 
second, while a computer can take 
things on or off tape much faster. The 
typical personal computer can internal- 
ly manipulate at least 250,000 bytes per 
second when programmed with an 
assembler. A video display can depict 
about 1,000 characters at a time, and 
can refresh itself 30 or 60 times per sec- 
ond, depending on the way the display 
handles the interface. It will be the ob- 
jective of the CIA system to put informa- 
tion onto the cassette tape in sorted 
order as fast as the user can type in the 
unsorted data. The user can therefore 
type the definition of words like best 
and machine into our imagainary dic- 


99 


Le 


H,1,J,K,L 


Se - 


etd 


TEXT 


U,V,W,X,Y,Z 


MEMORY 
BUFFER 


INPUT 
CASSETTE 
TAPE 


Figure 1: This is a basic diagram showing the input arrangement for the information retrieval system. The text is entered 
from a keyboard into a buffer area, and sorted alphabetically. In the example, the three text strings start with S, D and 
K. When a certain buffer area is filled the information it contains is dumped to an input cassette tape. The information 
on the tape is sorted in alphabetical order. When any one tape is filled, or an updated master file is desired, the input 


tape is added to the master file. 


100 


MASTER 
FILE 


tionary, and the computer will place 
both in their proper places on the tape. 

A large part of main memory, at least 
4 K bytes, is used as a buffer. As new 
data is typed in, it is added to the buffer 
and sorted on keyword. This sorting can 
be done by rearranging the data in the 
buffer in sequential order. An alter- 
native is to keep items in unsorted order 
and maintain two pointers for each item, 
one pointing to the !ocation of the item 
which is next in sorted order, the other 
pointing to the previous item in sorted 
order. The second system eliminates un- 
necessary searching in memory, but in- 
volves longer and trickier programming. 

As the memory buffer accumulates 
data, it is important to keep track of 
how it is filling up. The alphabet is divid- 
ed into eight sections for accounting 
purposes. The first section may be for 
words starting with A or B, etc. Eight 
counters keep track of how many bytes 
are taken up in the buffer by different 
ranges of the alphabet. When one 
counter exceeds certain limits, the 
cassette is moved to the region of tape 
that corresponds to that range of the 
alphabet. Next, the data held in the buf- 
fer is transferred onto the tape in the 
proper location. Obviously the data 
would then be erased from the memory 
buffer to make room for more data. The 
end result is a cassette tape containing 
sorted information which generated at 
the same rate that the user is typing in 
the unsorted data. (See figure 1). If the 
data is sorted as fast as it can be input, 
what would be the advantage of greater 
throughput? The system works as fast as 
is necessary. 

The system is generalized: it can be 
used to make huge mailing lists, keep 
track of books in a library, and so on. It 
has two principal limitations in addition 
to speed, size and the simple nature of 
the data that it can handle. 

What can you do when you fill up a 
cassette? It may take a while, since it is 
possible to fit as many as 500,000 
characters on a digital cassette tape. 
You could maintain a master set of 
twenty-six tapes, one for each letter of 
the alphabet. Once your tape is full, it 
would be merged with the master file, an 
unwieldy process at best. This procedure 
would mean putting master tape #1 in 
One cassette machine, your input 
Cassette in another, and starting the 
Merging program. After a while, the 
computer could signal that it had put all 
of the A entries onto master tape #1. 
Then you would take out tape #1, 
replace it with tape #2, and so on, up to 


tape #26. This process would happen 
rarely, or as often as you would require 
an up-to-date master file. But imagine 
how much data your system could hold. 
A friend of mine humorously pointed 
out that if you were mechanically inclin- 
ed, you could automate cassette 
manipulation. It would be a hybrid of a 
jukebox and large-scale automated mass 
memory with media manipulation mech- 
anisms. An automatic, multimegabyte 
memory system for a few thousand 
dollars would be very impressive. 

The question of data bases is a bit 
tricky. Data can be abstract, highly inter- 
related, and difficult to categorize. Your 
data will be interrelated in ways that the 
data base cannot show or represent. 
There are two approaches to follow. 
One way is to design very abstract data 
structures that show relationships in- 
herently. The other is to maintain the 
simple dictionary alphabetic system and 
add several cross-reference pointers. An 
example would be “Kennedy, Jackie: see 
Onassis, Jackie.” By pursuing all the 
pointers listed under a keyword, and 
checking out all the pointers listed 
there, a tree structure is developed. A 
multicassette data system implies a 
significant amount of tape manipula- 
tion, unless you have built the jukebox 
system. 

Although a pointer system is a bit 
crude, it can be handled automatically. 
The following example illustrates a 
typical entry. The original data entry: 
“Beethoven, Ludwig van, Symphony 
Number 3 (The Eroica)” would be filed 
under “Beethoven, Ludwig van.” If the 
user wants to generate the cross- 
reference pointer “Eroica Symphony, 
see Beethoven, Ludwig van,” a special 
character could be typed before 
“Beethoven, Ludwig van” which the pro- 
gram would recognize. The program 
could then add “Eroica Symphony, see 
Beethoven, Ludwig van” to the text buf- 
fer. This would insure that the pointer 
and the data match, eliminating a prob- 
lem with typographical errors. 

Later, if it is necessary to eliminate 
the entry, you would know that the 
cross-reference pointer is also in 
memory because of the existence of the 
special character. Other special func- 
tions can be implemented by special 
characters, such as labeling the data 
source of facilitating tabular data. This 
is left as an exercise for the reader. The 
power of this information handling 
system is limited mainly by the size of 
programs that can be sorted in memory, 
and by the speed of the tape recorder. 


101 


102 


The Computer Information Arrange- 
ment needs five separate programs 
to work properly. Note that it is not 
necessary for more than one to be in 
memory at any time. Program 1, the in- 
put program, is the biggest and most dif- 
ficult. It accepts characters from the 
keyboard, edits them, adds the cross- 
reference, puts them in the buffer, 
recognizes when the tape machine is 
idle or part of the alphabet range is get- 
ting full in the buffer, and spreads the 
data on the input cassette by means of a 
linear hashing formula. It may be 
necessary for the tape recorder to be 
controlled by a separate microprocessor 
and 1 K bytes of programmable memory 
shared by both processors, because of 
timing considerations. The second one is 
the merge program, which merges the in- 
put tape with the master set of cassettes. 
The third program, called the clean up, 
goes through tapes, “unbunches” data, 
and straightens out any local area that 
gets “out of sort.” The fourth program is 
the display program. The user can tell it 
to display the Richard Nixon file, 
whereupon it will display all the 
references and pointers that are filed 
under the keyword “Nixon, Richard.” 
The last program does a crucial, but 
easily forgotten job: altering or deleting 
outdated or incorrect data from the in- 
put tape or from a master tape. 


The CIA is a general computer- 
information arrangement, an answer 
machine, or a list maker. Put in random- 
ly ordered data and it comes out neat 
and organized. The arrangement has 
many applications: small business, jour- 
nalism, research, or help for folks who 
have trouble organizing things. This is 
the type of program that will sell small 
systems to the world. @ 


GLOSSARY 


Alphabet range: a part of the contiguous alphabet 
used to decide where to store alphabetically 
sorted data. 

Buffer: a section of random access memory used 
to temporarily store data until enough is collected 
to pass on. 

Cross-reference: a notation to look elsewhere in 
the data base for more information. 

Data base: a collection of information and the 
system used to organize it for use by a computer 
program. 

Entry: a block of data that stays together during 
the sorting routine. The end of an entry is 
recognized by a special termination character. 
File: a set of entries with the same keyword, 
Input tape: a cassette that accepts the sorted 
data. For larger data bases, it must be merged 
with the master tapes. 

Keyword: a word in an entry used to sort the en- 
tire entry into the data base. 

Master tapes: a set of cassettes that make up the 
entire data base. Each cassette covers a portion 
of the alphabet range. 


Computer Information 


Arrangement (Upgrade #1) 


Bill Roch 


Mr Holladay’s article “Computer In- 
formation Arrangement,” in the October 
1977 issue of BYTE, describes a method 
of creating and maintaining data for a 
data-retrieval system. Obviously, the ap- 
proach to do the job right is with a large 
disk system, an expensive proposition. 
Even using floppy disks, there is a 
substantial investment involved. 

The system described in this article 
has the advantage of handling large 
amounts of sequential data with only 
one cassette interface and only two or 
three recorders. The heart of the system 
is a multiple-cassette controller that 
Operates under program control and 
handles up to four cassette recorders 
with one Tarbell cassette interface 
board. There is no question that tape 
cassettes are slower than disks and that 
they are sequential, but by using the 
speed of the Tarbell board (ie: 540 bytes 
per second, 2200 bits per inch) and com- 
pressing out the blanks, large amounts 
of data can be handled rapidly. 

Only two programs including a third 
optional sort program are necessary to 
Operate the system: 


@ an input program that builds input 
records in sequential order and 
writes them to tape 

@ an update program that reads the 
input tape and updates the old 
master tape to a new master tape 

@ an optional sort program that sorts 
the input tape so inputs may be 
made in random order 


Actually, one program will do the job 
if updating is done sequentially from the 
keyboard. A single file maintenance pro- 
gram is more complicated than the 


multiple-programs system, and will be 
described later. 


Input Program 


This program displays or prints out 
the field name, then accepts keyboard 
input. When all the fields are full, the 
program allows the user to correct the 
record if necessary; otherwise, it writes 
the record to tape. Refer to the input 
program in figure 1 for step numbers. 

In step 1, the maximum number of 
records to be input is entered. This is 
important if the records are to be sorted, 
since memory limitations dictate the 
quantity of records a sort program can 
handle. The RO-CHE Systems Cassette 
Operating System (RCSCOS) software 
that comes with the multiple-cassette 
controller opens and closes files and 
allows files to be named. In step 2, the 
file is named and opened. (The ability to 
name files is included in a section of the 
basic program that turns the control of 
program from BASIC to the input/output 
(I/O) driver.) At this time, a beginning file 
mark is written on the tape and the file 
name is written. 

The input aspect of the program 
(steps 3 thru 7) consists of a loop that 
displays the field names and asks for the 
appropriate field value for the current 
record. The first field should be used as 
a code to later inform the update pro- 
gram of the type of record it must han- 
dle (ie: add, delete, or replace). This field 
should also be used to enter an end-of- 
program indicator. After each field is 
filled with information, the next field 
comes up until all fields are filled. Step 7 
provides the option of correcting the 
records. With some extra coding, it is 


103 


SET MAX 
RECORD 
COUNT 


NAME AND 
OPEN 


OUTPUT 
FILE 


DISPLAY 
FIELD 
NAME 


possible to only redo a particular field 
or to jump back to the previous field. 

Now that we have several valid fields, 
we can do the following: 


@ Write each field separately with 
RCSCOS placing delimiters be- 
tween fields. 

@Concatenate the fields together 
with a delimiter between each 
field. 

@ Pad each field with blanks to its 
full size, then concatenate the 
fields together. 


The RCSCOS software uses a buffer, 
and each write merely moves data to the 
buffer. Only when the buffer is full 


MOVE FIELDS 
TO OUTPUT 
RECORD 


WRITE 
OUTPUT 


RECORD 


ADD 1 
TO RECORD 
COUNTER 


MAX 
RECORDS 


DISPLAY 
INFO 


Figure 1: Flowchart for data entry (input) routine. In this routine, user 
entry is prompted on a field by field basis. The incomplete record can 
be inspected before it is written to the output tape file. 


104 


(ie: 256 bytes) does the software turn on 
the cassette recorder and write to the 
cassette tape. The greatest advantage of 
long records in the sort and update pro- 
grams is that the data is moved to and 
from tape as only one record rather than 
as a number of fields that make up a 
record. 

Do not be concerned about padding a 
field with blanks because blanks are 
compressed out by the RSCCOS soft- 
ware. For example, a twenty-character 
field containing only five data 
characters left-justified uses only 7 
bytes on tape: 5 bytes for the data, a 
blank count indicator, and 1 byte for the 
blank count. Step 10 monitors the 
number of valid records written. When 
this count matches the count entered in 
step 1 or when an end-program indicator 
is entered in field 1, the file is closed, as 
in step 12. When a file is closed, the re- 
mainder of the data in the buffer is writ- 
ten, followed by an end-of-file mark. 
Step 13 may be omitted, or it may be 
used to print out a record of what has 
been accomplished. 

Another variation to this program 
would be to use two or more cassette 
recorders and presort the data within the 
input program, such as letters A thru D 
to recorder #0, E thru | to recorder #1, 
etc, and have the counter keep track of 
the records being written to each tape. 
To this variation there could be added a 
routine to close a full-tape file. After the 
cassette is repositioned, a new file 
would be opened and processing could 
continue to all recorders. 


Sort Program 


If the input data is always in sequen- 
tial order, there is no need for sorting 
and this program is unnecessary; but this 
is often not the case. Refer to the 
numbered steps in figure 2. 

Step 1 of the sort program requests 
the number of records to be sorted so 
that arrays can be dynamically dimin- 
ished. In steps 2 and 3, the input and out- 
put files are named and opened. In the 
case of the input file, RCSCOS software 
turns on the recorder and reads until it 
finds a beginning file mark, then reads 
the first physical record. The program 
then checks the file name keyed in with 
the file name of the input file. The file 
name may be overridden is desired. For 
the output file, only the file mark and 
file name are written to tape. 

The input file is read into memory in 
steps 4 thru 7. As each record is read in- 
to its array, two additional arrays are 


a 


created. One contains the record ID 
which uniquely identifies that record. 
The other is a number array which car- 
ries the subscript number of the record 
read (ie: the record count number). 


Use whatever sort routine you prefer 
(step 8) to sort the two new arrays 
(record ID and record number) in order 
by record ID. Then write out the data 
records in memory in the new sorted- 
subscript order from the sorted-record- 
number field (step 9 and 10). 

To clarify this sorting technique, refer 
to table 1. 

The example file contains six names 
and address records with the ID made 
up of the first three characters of the 
name and the first character of the state. 
The index may be constructed of any 
characters in a record —your option. 
After sorting the index and record 
number only, the record-number array 
contains the record numbers in sorted 
order. That is RECNO (1) = 3, RECNO (2) 
= 4, etc. Write the records (stored in an 
array) in RECNO (ie: subscript) order and 
you will have the input data written to 
tape in sorted order. Now that the data 
has been sorted and written, the files are 
closed and some type of a completion 
message may be printed (step 12). 


Update Program 


While the flowchart in figure 3 has 
more boxes, it is still a relatively short 
program. The first thing the program 
wants is the names of the three files 
—Old Master, Update, and New Master. 
The program then opens the three files 
and initializes their flags. The flags have 
the following properties: 


NAME AND 
OPEN INPUT 
FILE 


NAME AND 
OPEN OUTPUT 
FILE 


NO 


READ INPUT 
RECORD 
INTO ARRAY 


SUMMARIZE 
INFORMATION 


SAVE RECORD 
1D IN ARRAY 


SAVE 
RECORD NUMBER 
IN ARRAY 


Figure 2: Flowchart for in-memory sort routine. Although the entire file 
of records to be sorted must be in memory, the entire record is not 


manipulated during the sort. Instead, the record key and relative 


@ The Old Master file flag is set at 
end-of-file (EOF). 
@ The Update file flag is set at EOF. 


RECORDS IN ARRAY 


AFTER SORT 
SORT 
INDEX 


BEFORE SORT 
SORT RCRD 
INDEX NO 


record number are sorted together, and the sequence of relative 
record numbers is used to index the array containing the body of each 
record. 


Table 1: A simple sort example. The input data is encoded as the first three letters of the last name, and the first 
letter of the state. The data is sorted by rearranging the indexes in the record number array. 


105 


106 


OPEN 

OLD MASTER, 
UPDATE, NEW 
MASTER FILES 


READ UPDATE 
FILE, 

GET TYPE; 
GET 1D 


SET UPDATE 
END OF FILE 
FLAG 


 GEAaORSGID si 
Ne, ARE BOTH oLO} 
—4master FLAG | 
TAN NEW MASTER 
IFLAG RESET ! 


SET OLD 
MASTER 
EOF FLAG 


SET READ 
OLD MASTER 
FLAG 


@ The New Master file flag is set 
when the Old Master file is to be 
read,and reset when the Old 
Master file is to be skipped. 


In step 4, the Update EOF flag is 
tested. If the Update file is at EOF, no 
further attempt is made to read the Up- 
date file; otherwise, an update read 
takes place. If the data read is an EOF 
mark, the Update EOF flag is set; other- 
wise, the first character, which indicates 
the type of record (ie: add, delete, or 
replace) is saved as well as the Update 
record ID. 

In steps 8 thru 11, the same process is 
carried out for the Old Master file ex- 
cept that the ID is saved. Step 9 checks 


to determine if the Old Master should be 
read completely through this time. Step 
12 tests to see if both files are closed; if 
they are, the file updating is complete. 

If either file has not reached EOF, a 
test is made in step 13 to see which is 
still active. If the Update file is at EOF, 
the last Old Master record read is writ- 
ten to the New Master file and the pro- 
gram loops until the Old Master EOF is 
reached. If the Update file is still active 
and the Old Master file is at EOF then 
the Update record is written if it is an 
add record. 

If both files are active then an ID 
comparison is made in steps 18 and 19. 
If the Upgrade ID is less than Old Master 
ID, the updated data is written —that is, 


a. 
aN 
a 


RESET; 
READ OLD 
MASTER FLAG 


SET READ 
OLD MASTER 
FLAG 


SET READ 
OLD MASTER 
FLAG 


ERROR 
RECORD MESSAGES 


SUMMARY 
INFORMATION 


Figure 3: Flowchart for file update routine. This routine assumes the existence of the file to be updated (Old Master 
file) and a file containing the updates to be made (Update file). Each Update record contains the record type (add, 
replace, or delete) and a record ID (used as a key for sorting). 


if it is an add record (steps 25 and 26). 
Step 24 tells the program not to read 
another Old Master file record next time 
through the read loop. 

If the Update file record ID is greater 
than the Old Master file record ID, then 
write the Old Master record, set the flag 
to read the Old Master file, and then 
tead it (steps 19 back to 15). 

If both IDs match, then the Update 
fecord must be a replace or delete 
record. If it is a replace record, write it 
and set the flag to read the Old Master, 
then loop back and read in the next Up- 
date and Old Master records. If the 
record is a delete record, set the flag to 
tead the Old Master, then read a second 
Update and Old Master record without 


writing the first Old Master record —this 
deletes a record. 

There should be an edit step in the in- 
put program to prevent any invalid 
record type codes being used for add, 
delete, and replace. Error messages can 
be printed out (step 27) that indicate an 
attempt was made to do one of the 
following: 


@ add a record to an existing record 
@ delete a nonexisting record; or 
@ replace a nonexisting record 


The use of error messages of this type 
help insure that invalid transactions do 
not happen. This will also help insure the 
integrity of the New Master file. For ease 


107 


of identification, you may want to print 
out the ID of the erroneous update 
record. 

A count of transactions and number 
of records in the file may be printed 
prior to closing the files in step 29. 

The three programs described here 
should use less than six pages of code at 
one statement per line. In addition to 
this, it uses one half page of code per 
program for the BASIC I/O routine used 
with the BASIC I/O driver that comes 
with the multiple-cassette controller. 
This driver acts as software interface 
between the BASIC program and the 
cassette operating system. 

Let us turn to the single program that 
updates the Old Master file to a New 
Master file. This program would be a 
combination of the input program and 
the update program. 

One version would be to build the up- 
date data in an array in sequential order 
to sort the array as you build it, then 
have the program read the update data 
from the array as it would from the up- 
date tape. Another version would be for 
the user to enter the record type and the 
record ID at the point where the update 
program normally reads the Update file. 
The program would then read the Old 
Master file and write the New Master 
file until an ID match was found or the 
Old Master ID was greater than the ID 
typed from the keyboard. The following 
action would take place: 


@ Add —when the Old Master ID is 
greater than the ID typed in, the 
program would stop and the new 
record could be typed in, and then 
written to the New Master file. 


108 


@ Delete —when the IDs match, the 
next Old Master is read. 

@ Replace —when the IDs match, 
the new record would be typed in, 
replacing the record last read. 


After any of the above transactions take 
place, the program accepts new add, 
delete, replace, or end commands. 

One of the most important advan- 
tages of this type of file-handling system 
is the cost for the work it can do. $40 
recorders become your tape drives. The 
Tarbell cassete interface kit from 
Tarbell Electronics, 950 Dovlen Place, 
Suite B, Carson CA 90746, costs $120, 
and the Multi-Cassette Controller kit 
from Elliam Associates, 24000 Bessemer 
St, Woodland Hills CA 91367, sells for 
$55 for the two-port, and $70 for the 
four-port model, including software to 
operate with MITS BASIC versions 8 K 
3.1, 8 K 4.0 and extended 4.0. Elliam 
Associates also offers a table-driven, 
file-maintenance system written in 
BASIC, similar to the one described, on 
cassette (Tarbell format) for $20. 


Summary 


The programs described in this article 
provide the means for building and 
maintaining sequential files that can 
contain any type of data desired. Only 
string records have been described here, 
but records containing numbers can also 
be handled. This means that in addition 
to the mailing list-type files, accounting 
type information can also be main- 
tained. The fact is, this system allows the 
user to have big computer power on a 
hobby budget. 


Table Manners: An Introduction 
and Guide to Table Handling 


Techniques 


Timothy L Gauslin 


Did you ever have that sinking sensa- 
tion when you ran a new program for the 
first time and the output appeared so 
slowly that you felt you could have per- 
formed the calculations and plotted the 
results just as well by hand? Most pro- 


" grammers encounter this problem en- 


tirely too often. A debugging effort 
usually leads to the discovery of some 
form of afray manipulation as the 
culprit behind the performance prob- 
lem. 

In this article | will examine some 


“common and not so common tech- 


niques for handling the loading and 
searching of arrays. The methods being 
used are usually applied to handling 


"string data in single-dimensioned arrays 


which are often called tables. The 
methods | will employ in building and 
searching tables are quite straightfor- 


_ ward. There are many techniques other 
than those included in this article for ar- 


_ fanging and finding data within tables. 
_ The obscure program code and complex 
_ Math often associated with these other 


techniques prompts me to leave them to 
the programming and math gurus. 

Let us look, then, at what remains that 
is straightforward, efficient, and _pro- 


" grammable on a microcomputer. In pro- 


ressive steps we will start with the way 
Most programmers build and search 
tables, and work our way toward the 
exotic. 

The following techniques will be ex- 
amined: 


@ serial table loads and searches 
@ binary table searches 

@ bubble sorts 

@ percolated table searches 

@ hashed table loads and searches 
@ indexed tables 


Each of these techniques is presented in 
its simplest form. It is left to the reader 
to research and develop further im- 
provements to each technique. 


Definitions 


To aid in understanding the varied 
approaches to array handling which will 
be examined, some common under- 
standing of the terms used is in order. 
First to clarify the definition of an array, 
or table, as it is used in this discussion: 


An array is a programming-language 
description of a series of data 
elements having common attributes. 


The elements of an array (or table) consist of a key field and one 
or more related data fields. The key field is examined during table 
loads and searches to place or locate the related data fields within 
the table. 

Some programming languages provide a facility to define the 
key and data fields of a table through a single definition. A COBOL 
example of such a definition is provided herein. 


01 ZIP-CODE-TABLE. 
05 ZIP-TABLE-ENTRY OCCURS 
1000 TIMES. 
10 ZIP-CODE 
10 CITY 


PIC 9(5). 
PIC X(20). 


Other programming languages, such as BASIC, require that 
each element of the table be defined using a separate definition. 
For purposes of this article such multiple definitions will be con- 
sidered to be a single table. A BASIC definition similar to the 
COBOL example given previously is presented below: 


10 DIMENSION ZIP(1000) 
20 DIMENSION CITY$(20,1000) 


Figure 1: Array definition examples. 


109 


Table 1: Definitions used in the text and figures. 

Table load: A programming language routine that 
accepts one or multiple elements of data and places 
these elements within selected occurrences of a 
table. 

Table Search: A routine that locates data within a 
table based on an argument passed to the routine. 

Table sort: A routine that accepts one or multiple 
elements of data and places these elements into a 
table in an ordered sequence; a table sort is a 
specialized form of table load. 

Table key: A data element contained within a table 
that is compared to an argument presented to a 
search, sort, or load routine to locate a corresponding 
data entry within the table. 

Argument: A data element presented to a table 
search routine for location of a similar value in the 
table, or load routine for inclusion in the table. 

Subscript: A numeric integer element of data used 
to identify a specific occurrence of data within a 
table. For example, a table might contain 500 
elements of data, all of which are named A$. One way 
to reference, for example, the fiftieth occurrence of 
A$ is A$(50). 


Table 2: Legend for variable names used in 
flowcharts. 

a. KEY — A key data element contained within a 
table. 

b. DATA — A non-key data element contained 
within a table or presented to a table load routine for 
inclusion in a table. 

c. M — The maximum number of entries to be con- 
tained by a table, or the number of entries under con- 
sideration by a binary search routine; the notation 
INT((M1 +1)/2) refers to M/2, rounded up to the 
nearest integer. 

d. T — The maximum number of tries to be at- 
tempted at accessing a table key value. 

e. F — The current number of filled table entries. 

f. $1 — A data element used as a subscript. 

g. S2 — A data element used as a subscript. 

h. PREFIX — The portion of an argument used to 
address an index table, 

i. PFX — A short form of PREFIX. 

j. SPREFIX — A prefix save area used by the Index- 
ed Table Load routine. 

k. SAVE — A temporary area used for storage of 
argument, key, and other data elements. 


110 


An array often eliminates the need for 
separate entries for repeated data, 
since it can indicate the number of 
times data with identical format is 
repeated. Reference to single 
elements within an array is accom- 
plished through the use of a subscript 
following the data-element name. 


This definition is not all inclusive, but 
will suffice for the purposes of this arti- 
cle. Let it be further understood that a 
single array, as defined above, may 
require multiple descriptions to define 
all of the elements of data to be housed 
by the array. Figure 1 serves to illustrate 
this fact. 

Some other terms used throughout 
this article are subscript, table load, 
table search, argument, and key. Table 1 
provides these definitions. A glossary of 
symbol definitions used in flowcharting 
examples presented throughout this arti- 
cle is provided in table 2. 


Serial Table Handling 


For most, logic patterns occur as a 
series of events leading to a conclusion. 
Therefore, programming the serial table 
load is the easiest way of placing infor- 
mation into a table for later access. A 
flowchart for this procedure is illus- 
trated in figure 2. 


The process involved here consists of 
acquiring an entry for the table and 
placing the entry in the next available 
slot in the table. This technique is as effi- 
cient as any that will be discussed for 
placing data into a table, but it may 
cause problems in later accessing the 
table data if the table is loaded in an 
unordered sequence. 

Assume, for example, that the table 
just created is loaded with the key 
values: 


(W, Z, X, B, Y, D, C, A) 


Each and every key value in the table 
must be examined until the key A is 
located. This table-search technique is 
known as the serial table search. Figure 
3 presents one implementation of this 
type of table lookup. 

In our example, eight iterations of the 
search logic are required to locate key 
field A. In this case the search time is 
not excessive. If, however, key field A 
were the 1000th occurrence in the table, 
it is obvious that considerable time 
would be spent in searching for this 
table entry. 


Binary Table Searches 


If the table described above is 
ordered (ie: sequenced by key) as shown 


below, certain assumptions can be made 
about the relative location of any key 
element within the table: 


(A, B, C, D, W, X, Y, Z) 


First, it may be assumed that the key 
being searched for is located higher in 
the table if the key we are presently 
looking at has a value less than that of 
the desired key, and vice versa. 

If the search of an ordered table is 
begun in the middle, logically the size of 
the table can be halved each time we 
fail to locate the desired key. For 
instance, if a table contains one hundred 
entries, and the fiftieth entry is greater 
than the key being searched for, we 
know that the desired key is somewhere 
within entries 1 thru 49. The max- 
imum number of tries it should take to 
locate a key within a certain size table 
can be computed. Since the table is 
halved each time it is examined, in 
‘essence we are converting the number 
of entries in the table to a binary value. 
The number of digits in this binary 
number, plus 1, gives the maximum 
number of attempts necessary to locate 
a key within the table. 

When converted to binary, the one hundred 
‘occurrences in our sample table pro- 
“duce the number 1100100. The seven 

"digits in this number, plus 1, reveal that 
a maximum of eight attempts would be 
_ made to locate a key value within this 
table. The information available may 
“now be combined to produce the binary- 
“search algorithm illustrated by figure 4. 
Note that the average number of 
“attempts at locating data in a table of 
One hundred entries has been reduced 
from fifty for the serial search to seven 
for the binary search. (The average 
_ humber of attempts required to locate a 
_ key using the binary search is approx- 
‘imately one less than the maximum 
number of attempts, or 8 — 1 = 7 in this 
_ Case.) The larger the table, the more 
Significant is the saving realized in using 
binary-search techniques. Due to the ad- 
ditional overhead of the binary-search 
‘algorithm, however, it should be limited 
to use in searching tables of twenty-five 
_ entries or more. 


Bubble Sorts 


__ As you may have already noticed, the 
Primary limitation to the use of binary 
Searches is the requirement that the 
“table being searched must be sequen- 
tially ordered. If this is not the case, 


INITIALIZE TABLE SUBSCRIPT 
(S1=0) 
INITIALIZE M=SIZE OF TABLE 


CHECK TO SEE IF 
~=-4 SUBSCRIPT HAS 
| Sarena TABLE SIZE 


INDICATE ERROR 
TABLE SIZE EXCEEDED 


GET FIELDS FOR NEXT 
RECORD TO BE LOADED 
INTO TABLE 

(KEY, DATA) 


INCREMENT S1 TO POINT 
TO NEXT EMPTY SUBSCRIPT 
IN TABLE 


(S1=S1+1) 


LOAD NEW RECORD INTO TABLE 
(KEY(S1)= KEY 
DATA(S1)=DATA) 


Figure 2: Flowchart for serial table load. In this load, data is placed in 
the table in the order in which it is presented to the table loading 
algorithm. 


ae ee = 

CHECK TO SEE IF ' 
SUBSCRIPT HAS. ! 
REACHED TABLE SIZE it 


INDICATE UNMATCHED 
ARGUMENT 


INCREMENT S1 TO POINT 
TO NEXT TABLE ENTRY 
(S1sS1+1) 


poco 
1 1S KEY BEING 5 


SEARCHED FOR EQUAL 1 
TO THE KEY OF THE 
L CURRENT TABLE entry? | 


Figure 3: Flowchart for a serial table search. In this search, each table 
entry is examined in turn until the key value matching the argument is 
found. 


111 


apply the bubble sort during entry of 
table data to overcome this obstacle. 
The flowchart for the bubble sort is 
given in figure 5, where it can be seen 
that the table to be loaded with data is 
first assumed to be empty (F = 0). The 
first argument to the bubble sort is 
stored at table entry 1, and subscripts 
for future entries are initialized. For 
each subsequent argument passed to 
the bubble sort, the highest active entry 
in the table (that is, the entry furthest 
from the beginning of the table) is com- 


INITIALIZE 
M = TABLE SIZE 
M1=RANGE OF SEARCH=M/2| 
T = MAX NUMBER OF 

TRIES NEEDED= 
1+ (NUMBER OF DIGITS 
\N M EXPRESSED 
IN BINARY) 
$1= TABLE SUBSCRIPT 
(Sl=M1) 


ris KEY BEING 1 
—— 4 SEARCHED FOR EQUAL 

TO THE KEY OF THE | 
L_ CURRENT ENTRY? ' 


HALVE RANGE OF 
SEARCH, ROUNDING UP 
(M1 =INT((M141)/2)) 


pocccc----- 
1 1S KEY BEING SEARCHED } 


1 
—--{FOR GREATER THAN 1 
THE KEY OF THE 1 

| CURRENT ENTRY? 1 


DECREASE TABLE 
SUBSCRIPT 
(S1=S1-M1) 


INCREASE TABLE 
SUBSCRIPT 
(S1=S1+M1) 


DECREASE NUMBER 
OF TRIES LEFT 
(T=T-1) 


4 
ARE THERE ANY 
TRIES LEFT? J 


NO 
INDICATE UNMATCHED 
ARGUMENT 


Figure 4: Flowchart for a binary search. In this search, the number of 
table entries under consideration is halved with each search iteration 
until a key that matches the argument is found; the table is assumed to 
be sorted by ascending key. In the example in the text, the table size is 
100, giving the number of tries needed in the search as T=8. 


112 


pared to the argument. Each time the 
argument compares lower than the key 
value in the table, the key is shifted up- 
ward (opening a “hole” in the table), the 
subscript pointing to the active entry in 
the table being looked at is 
decremented, and the comparison is 
repeated. When the proper (ie: ordered) 
location for the argument is discovered, 
the hole left by the prior pass through 
the bubble sort is filled with the argu- 
ment value. The end result of this table 
manipulation is an ordered table pro- 
duced from unordered input data. We 
may now apply the binary search in 
referencing our table. 

One other item: since the bubble sort 
is invoked only once per table entry, it is 
considered far more efficient than the 
use of the serial search, which examines 
each table entry an untold number of 
times based on the number of 
arguments presented to the search. 


Percolated Table Searches 


The table-handling techniques 
discussed so far assume that the key 
values being searched for are uniformly 
distributed throughout the table. In 
many applications, a few entries within 
the table receive the bulk of attention 
during processing. A prime example of 
this type of processing is a mailing list, 
where zip codes from a common area 
are used to retrieve city names from a 
table. 

Assume that a computer club is 
located within a moderately sized town, 
with membership centered within the 
town. A mailing list maintained by this 
club would certainly reference zip 
codes within the town more frequently 
than those representing outlying areas. 
One is able to cause these high-activity 
table entries to be located more rapidly 
by applying the percolated table search. 
The flowchart for the percolated search 
is illustrated by figure 6. 


In the illustration, the table to be 
searched has previously been loaded us- 
ing the serial table load. A comparison 
of figures 3 and 6 will show that the per- 
colated search is nothing more than a 
serial search with additional logic ap- 
pended to flip a key satisfying an argu- 
ment with the key immediately below it 
(ie: closer to the beginning of the table). 
In a high-volume processing environ- 
ment, the net result of this logic moves 
high-activity table keys downward in the 
table and low-activity keys upward in 
the table. Thus high-activity keys are 


located by the serial search with a 
smaller number of tries, even though the 
table is unordered. In effect, what we 
have is a table ordered by activity rather 
than by key value. 

A major advantage of the percolated 
table search is the load sensitive nature 
of the technique. As demands against 
the table vary, the organization of the 
table changes in compliance. To take 
full advantage of this dynamic attribute, 
the table should be loaded from exter- 
nal media at the beginning of each pro- 
gram run, and stored back to external 
media at the end of each run. This in- 
sures that the order of the table 
elements reflects the relative activity of 
the cumulative use of the table. 


Hashed Table Loads and Searches 


The fastest technique for loading and 
retrieving data from tables is the use of 
tandomized, or hashed, values to deter- 
mine starting locations for table loads 
and seaches. This technique is common- 
ly employed in managing the symbol 
tables of compilers and assemblers. A 
major drawback to this technique is the 
‘fequirement for oversized tables to 
avoid the generation of an inefficient 
number of table synonyms. Synonyms 

' are separate key values that attempt to 
‘occupy the same location within a table. 
To avoid synonyms, tables that utilize 
hashed techniques are usually estab- 
lished at least 20% oversize. 

In considering the use of hashed 
tables, two items must be given priority: 


1) Every possible argument presented 
must randomize within the number of 
entries set aside for the table. 

2)Hashed values derived from 
arguments should be distributed 
evenly within the range of the table to 
avoid synonyms. 


_ Asimple technique for hashing the con- 
tent of a table argument is to strip the 
left half of each byte in the argument, 
pecking the right halves together to 
form a binary numeric value. This value 
is then divided by the prime number 
_ which is closest to, yet still less than, the 
Number of entries in the table. The 
“femainder from this division is 
€mployed as the value of a subscript to 
id or retrieve the desired table entry. 
An example of this hashing algorithm, 
applied to a table of 1000 entries, is 
Presented in figure 7. 

In figure 7, an argument value of 


ABYZ is hashed to a subscript value of 
518. A synonym argument for ABYZ is 
ABY)J. That is, argument ABY)J will also 
hash to a subscript value of 518. There is 
a total, in this example, of three 
synonyms possible for each position in 
the argument. That is, three equivalent 
characters in each position of the argu- 


INITIALIZE 
TABLE SIZE M 
INITIALIZE POINTER F 
(F=0) 


GET FIELDS FOR NEXT 
RECORD TO BE LOADED 


INTO TABLE 
(KEY, DATA) 


esd remtr ar tagasp " 
1 THERE MAY BE | 
LNo DATA 1 


INCREMENT POINTER F 
TO FIRST EMPTY 
LOCATION IN TABLE 
(FeF+1) 


$1 POINTS TO HOLE 
IN TABLE 

$2 POINTS TO TABLE 
ENTRY BELOW IT; 
INITIALIZE S1 =F 
S2=F-1 


| peostaxte apg care a 
{DOES SUBSCRIPT} 
_F POINT BEYOND} 
ITHE SIZE OF | 
[THE TABLE? 


INDICATE ERROR: 
TABLE SIZE EXCEEDED 


KEY(1)= KEY 


ps caer a 
IDOES THE 1 
ICURRENT RECORD} 
FIT INTO 1 
|THE HOLE AT. | 
POSITION S1? 

L —d 


MOVE HOLE TOWARD 
BEGINNING OF TABLE 
AND PREPARE TO 
RETRY COMPARISONS 
(KEY(S1)= KEY(S2) 
Sl=S2 
$2*S2-1) 


Figure 5: Flowchart for a bubble sort. In this sort, data is entered in an 
unordered sequence and “bubbles” from the end of the present table 
to its beginning, stopping when it is in ascending key sequence in rela- 
tion to the rest of the entries in the table. 


113 


INITIALIZE TABLE SUBSCRIPT 
(S1=0) 
INITIALIZE M=SIZE OF TABLE 


r = 
1 CHECK TO SEE IF ! 
4 SUBSCRIPT HAS 


INDICATE UNMATCHED 
ARGUMENT 


i REACHED TABLE Size ! 
INCREMENT $1 TO POINT TO 
NEXT TABLE ENTRY 
(S1=S1+#1) 
1S KEY BEING ' 


~"41T0 THE KEY OF THE 
| CURRENT TABLE ENTRY? | 


4 SEARCHED FOR EQUAL | 
1 
' 


SWITCH THE CURRENT 
(MATCHING) ELEMENT WITH 
THE ONE BEFORE IT TO 

BRING IT CLOSER TO THE 
BEGINNING OF THE TABLE 


(S2*S1-1 
SAVE = KEY(S2) 

KEY(S2) = KEY(S1) 
KEY(S1) = SAVE) 


UAT END OF ALGORITHM, } 
4 $2 1S POINTER ! 
1 To TABLE ENTRY ' 
MATCHING KEY | 


Figure 6: Flowchart for a percolated table search. In this search, a 
serial search is conducted, but when a match is found, the desired en- 
try is switched with the entry before it. This has the effect of moving 
frequently used entries toward the beginning of the table, where they 
can be found in fewer tries. 


Maximum number of table entries = 1000 
Prime number less than 1000 =997 
(Note that arguments are four alphabetic characters.) 


For a sample argument of ABYZ: 
Hexadecimal (ASCII) representation =41 42 59 5A 
With high order nybble stripped =129A16 
129As6 = 409610 + 25610 + 14410 + 1010 = 450610 
4506 + 997 = 4, remainder 518 
The 518th table entry will be used for storage of key value ABYZ. 


Figure 7: The hashed table addressing algorithm. This is perhaps the 
most efficient method of storing and retrieving entries from a table. 
In this example, a four character alphabetic argument produces a 
(usually) unique subscript value in the range of 1 to 1000. 


114 


ment will produce the same hashed 
subscript value. Therefore, there are 34 
possible synonyms for each table loca- 
tion, or 81 possible synonyms. There is 
also a possibility of 26* key values, or 
456,976 keys. Since our table will hold 
only 1000 of these possible keys, or 
about 0.2%, our synonym rate should be 
about 0.2% of 81, or approximately 0.16 
synonyms per table entry. To be safe, 
place any synonyms encountered in 
loading the table in the first vacant table 
position following the location selected 
by hashing. Assume also, based on our 
computed average synonym rate of 0.16 
synonyms per entry, that any synonyms 
will be located within five entries of the 
original hashed table location. This 
number should provide for any devia- 
tion from a normal distribution. The 
flowcharts for loading and searching 
hashed tables are presented in figures 8a 
and 8b. 


Indexed Tables 


The final area of discussion centers 
around the use of multiple tables to op- 
timize the use of a cumbersome single 
table. 

Let us make an extension to the 
earlier zip code example. The normal 
approach to a mailing list would require 
that each record contain the member's 
name, his street address, city, state, and 
zip code. But since the members all 
come from the same area, a better way 
to form the mailing list is as follows: 
remove the city and state fields from the 
mailing list file and, instead, build a 
table with zip code as key and city and 
state as data fields. When a label is 
printed, use the zip code, which is in the 
mailing list record, to find the appro- 
priate city and state in the table. 

When the (sorted) zip code table 
records are loaded into their own table, 
an index table that points to the begin- 
ning of a zip code group can be used to 
narrow down the range of a particular 
search method. (See figure 9a.) In this ex- 
ample, the indexed table lookup with a 
binary search will be followed. 

Since the computer club’s member- 
ship represents only a single geographic 
area, only the last four digits of the zip 
code may be used (assuming a constant 
value for the first digit). We may now 
segment the four-digit zip codes into a 
zip code prefix (ie: the second and third 
digits) and suffix (ie: the fourth and fifth 
digits). As the zip code table is loaded, | 
will track the first occurrence of each 


zip code prefix and record its location in 
the zip code table through an entry in 
the index table. The index table will con- 
tain one hundred entries representing 
zip code prefixes 00 thru 99. Thus the 
first entry in the index table will repre- 
sent zip prefix 00 and will contain a 
value of 01, indicating that the first en- 
_ try in the zip code table is zip code O0xx. 
The second entry in the index table will 
represent zip code prefix 01 and contain 
a value identifying the first occurrence 
of zip code 01xx in the zip code table, 
etc. An example of the tables generated 
by this method is given in figure 9a. 
When both tables have been 
built, there is an index table which may 
be accessed without a search (using the 
zip code prefix +1 as a subscript) to 
identify the first location to be searched 
in the zip code table. From the point in 
the zip code identified by the index 
table, a serial, binary, percolated, or 
other form of search may be initiated. 
Figures 9b and 9c illustrate the logic of 
‘an indexed table load and indexed 
binary search. 


p ‘Summary 


1 have now examined, at a glance, a 
Variety of techniques that allow us to 
_fapidly access data stored in single- 
_ dimensioned tables. In the case of small 


Text cont. on page 118 


Figure 8a: Flowchart for a hashed table 
load. Here, the hashed value of the key 
used as a subscript for storing the cur- 
Tent entry in the table. If the current en- 
try is already filled (this is known as a 
Collision), the next higher table entry is 
tried, up to a total of five attempts. A 
hashed table cannot be entirely filled, so 
the table size must be somewhat larger 
lan the expected maximum number of 
Entries to be loaded into the table. 


INITIALIZE TABLE SIZE M 
INITIALIZE POINTER F 
(F=0) 


INITIALIZE OPTIMAL. 
MAXIMUM SIZE 
M1=0.8xM 


GET FIELOS FOR NEXT RECORD 
TO BE LOADED INTO TABLE 
(KEY, DATA) 


f THERE MAY BE 4 
N ATA 
0 ao 


LAST 
ENTRY BEEN 
PROCESSED 

? 


cr a 
{ NUMBER OF ENTRIES | 
| REACHED OPTIMAL | 

TABLE SIZE ? A COM-| 
1 PARE FOR AN EQUAL | 

———-4 CONDITION ISSUES | 
1 ONE WARNING MES- | 
1 SAGE RATHER THAN | 
H REPETITIVE MESSAG-| 

Es. 


Low... 4 


CONVERT KEY VALUE 
TO HASHED VALUE 
S1=HASH(KEY) 


—  . = 
Pat THIS POINT, NO ATTEMPTS I 

1 HAVE BEEN MADE AT STORING L—— 
Le DATA (T=0) 1 


LOAD NEW RECORD INTO TABLE 
(KEY(S1) = KEY 
DATA(S1)= DATA) 


TRY NEXT HIGHER ENTRY 
IN TABLE 

(S1=Sl+1 
T=T+1) 


INCREMENT NUMBER 
OF LOADED ELEMENTS 
COUNTER 

(F = F+1) 


Too 
MANY TRIES 
(T>5) 
y 


115 


_! NO ATTEMPT AT 
| THIS POINT (T=0) | 
hp kiN Lb Sesh otal J 


CONVERT KEY VALUE 
BEING SEARCHED FOR 
TO HASHED VALUE 

S1=HASH(KEY) 


Pi TS Face eer gee 
| DOES VALUE BEING | 


| SEARCHED FOR MATCH j 
| VALUE IN TABLE? 


PREPARE FOR MATCH WITH 
NEXT HIGHER ENTRY IN TABLE 
(S1=S1+1 
T=T+1) 


MANY TRIES 
(T>5) 
2 


Figure 8b: Flowchart for a hashed table search. In this search, the 
hashed value of the key being searched for is used as a subscript to the 
expected location of the record in the table. If the desired record is 
not i that location, the next four table entires are checked for a 
match. 


30012 IND(1)=zip code 300xx =01 KEY(1) =30012 
30015 = IND(2) =zip code 301xx =04 KEY(2) =30015 
30017 IND(3) =zip code 302xx =00 KEY(3) =30017 
30121 IND(4)=zip code 303xx =05 KEY(4) =30121 

r ui KEY (5)=30130 


Figure 9a: Indexed table lookup example. In this mailing label example, the zip code 
table to be loaded, left, must be ordered in ascending key sequence. (The data 
associated with each key is not shown here.) Each key is loaded into the KEY table 
with a pointer from the IND table indicating the first occurrence of a new prefix (ie: 


nee 2 and 3 of the zip code). The associated data is loaded similarly into a data 
table. 


116 


INITIALIZE TABLE SUBSCRIPT 
(S1=0) 


TIE, SET TO VALUE THAT | 
——-4 WILL NOT MATCH ANY 1 
[vault xt 


INITIALIZE CURRENT INDEX 
TABLE SUBSCRIPT VALUE 
(SPREFIX=~1) 


GET FIELDS FOR NEXT RECORD 
TO BE LOADED INTO TABLE 
(ZIP, DATA) 


PSone t= =n 
ENTRY BEEN S-___ 
PROCESSED TEENS. MAN BEAMOSDATAG| 
P 
EXTRACT PREFIX, INDEX TABLE 
SUBSCRIPT FROM ZIP 
(PREFIX2DIGITS 2 AND 3 OF ZIP) 
INCREMENT S$) TO POINT TO 
NEXT EMPTY SUBSCRIPT 
IN TABLE 
(81*81+1) 
Pace Les [ does PREFIX OF 4 
PEE eed etal CURRENT ZIP EQUAL | 
3 1 CURRENT INDEX H 
? ‘ABLE VALUE 
i ales eb ies os oa J 


LOAD POINTER TO ZIP TABLE 
INTO INDEX TABLE; SET NEW 
INDEX TABLE SUBSCRIPT VALUE 
(INDX(PREFIX+1)=S1 
SPREFIX = PREFIX) 


YES 


LOAD NEW RECORD INTO TABLE 
(KEY(S1) = Z1P. 
DATA(S1)= DATA) 


used as the key of a record that contains city-state Pairs as data. 


Figure 9b: Flowchart for an indexed table load. This load concurrently builds two 
tables, the first of which identifies major areas within the second. It is assumed that 
‘cords are available in ascending key sequence. In this flowchart, the zip field is being 


118 


INITIALIZE 


M=SIZE OF TABLE 


T=MAX NUMBER OF TRIES NEEDED 
= 1+(NUMBER OF DIGITS IN 
M EXPRESSED IN BINARY) 


EXTRACT TABLE LOOKUP INDEX 


PFX FROM ZIP CODE KEY 


LOOK UP LOW AND HIGH LIMITS OF 

SEARCH FROM INDEX TABLE 
(L=INDX(PFX41) 
H=INOX(PFX42)) 


SET WIDTH OF SEARCH Ml, 
FIRST ELEMENT TO BE TESTED $1 

(M1=INT((H-L)/2) 
Sl=L+M1) 


NO 
HALVE SEARCH WIDTH M1 
(M1 =INT((M141)/2) 


PREPARE TO LOOK LOWER 
IN THE TABLE 
(Sl=S1-M1) 


DECREASE NUMBER OF 
TRIES REMAINING 
(TsT-1) 


INDICATE UNMATCHED ARGUMENT 


ger pe dpednr nga on spots Car) a 
1 1S KEY BEING SEARCHED 1 
—-4 FOR EQUAL TO THE KEY | 
! OF THE CURRENT ENTRY? | 
iat eae ea deta legt —d 


PREPARE TO LOOK HIGHER 
IN THE TABLE 
(S1=S1+M1) 


Figure 9c: Flowchart for an indexed binary table search. Here, the index table INDX is 
used to set the upper and lower limits of search. The index table method can be used in 
conjunction with any search method. Here, it precedes a binary search. 


Text cont. from page 115 

arrays (ie: under twenty-five entries), 
serial searches prove to be most effec- 
tive most of the time. For larger arrays, 
binary load and search techniques tend 
to be most efficient. For very large ar- 
rays, or where access to array data is 
biased, the use of percolated, hashed, or 
indexed table-handling techniques may 


prove beneficial. 

None of the techniques described in 
this article should be applied in consci- 
entious programming without thorough- 
ly reviewing the overall effect of the 
technique on the application. There is 
no greater frustration than to optimize 
yourself out of the most effective way to 
accomplish your programming goal. m 


© write a program in which there were 
ne or more variables with strings as 
heir values. Many programmers, how- 
yer, are discouraged by the program- 
ling difficulties that arise in this con- 
ection, in all but the simplest cases. 
fhis is particularly true when space is at 
| premium and assembly language is 
sed as it is in many microcomputer ap- 
ications. | will describe here two alter- 
itive ways of solving these problems. 
istically, these are quite different 
rom each other. Each is fascinating in 
is Own way, and each has certain dif- 
culties which have to be surmounted, 
t either one of them will solve the 
problem with which we are con- 
ed. 
“Many versions of FORTRAN allow 
tiables to have strings as their values, 
these strings cannot have lengths 
lich are greater than some maximum, 
d this maximum is usually much too 
| for practical purposes. The max- 
m is, in fact, the number of 
Naracters in a word, which is usually 
four or six; sometimes it is five (as 
the PDP-10) and sometimes eight (as 
the IBM 370, using double words), 
in practice the strings we are con- 
ned with are often twenty, forty, or 
n sixty characters long. In many 
BOL programs, this problem is taken 
fare of by assigning some large number 
# characters to every such variable. 


4 


ariables Whose Values 
Are Strings 


W D Maurer 


This is particularly common when the 
value of the variable is somebody's 
name and address, to be printed on an 
envelope by the computer. Often 
twenty-five characters are reserved for 
the name, twenty-five for the address, 
and twenty-five for the city, state and 
zip code. This gives rise to two kinds of 
problems. In the first place, twenty-five 
characters is not enough for an address 
like 1527 San Jose-Los Gatos Rd., even if 
we leave the period off the end. More 
important, however, is the fact that if we 
reserve that many characters for every 
name and every address, there are going 
to be quite a lot of wasted characters. 
That doesn’t matter too much in a 
COBOL program, where space, par- 
ticularly on a disk, is usually quite abun- 
dant; but on a microcomputer we would 
like to make optimum use of all the 
space we have. 

The first solution to this problem to 
consider involves the use of a large ar- 
ray, called SPACE, for the storage of 
strings. Let us consider each element of 
this array to be one character long. Then 
the first string (whose length is L1, say) is 
stored in the characters SPACE (1), 
SPACE(2) and so on up _ through 
SPACE(L1). The next character, 
SPACE(L1+1), contains an_ illegal 
character code (0, for example) to 
denote the fact that this is the end of the 
first string. The second string starts at 
SPACE(L1 +2) and continues from there. 


119 


120 


Every string ends with a 0-character 
code, and all the strings are stored in the 
array called SPACE, in sequential order. 

Suppose now that these strings are 
supposed to be the values of variables 
K1, K2 and so on in the program. The ac- 
tual value of each of these variables will 
be an integer that indicates where the 
corresponding string starts. Thus, for ex- 
ample, if 17 is the value of K2, then 
SPACE(17) is the first character of the 
given string; SPACE(18) is the next 
character, and so on. This is the basic 
concept of a pointer: a quantity which 
indicates where another quantity is in 
memory. The pointers we have set up 
have been index pointers, but it would 
have been just as easy to set up address 
pointers. That is, instead of the integer 
17, we could have used the address, in 
memory, of the character SPACE(17). 

The basic problem that arises when 
this method is used can be seen if we 
consider the process of setting a 
variable to a new value. Suppose that 
the value of K1 is ‘SMITH’ and we want 
to change it to ‘JOHNSON’. Unfor- 
tunately, ‘JOHNSON’ has more letters in 
it than ‘SMITH’, so we cannot simply 
store the new characters in the same 
places as we stored the old ones. We 
can, however, take advantage of the 
fact that not all of our array SPACE has 
been used. Suppose that we have used 
the characters from SPACE(1) up 
through SPACE(LSPACE); then 
‘JOHNSON’ can start at 
SPACE(LSPACE +1), and we can set the 
pointer in K1 to be LSPACE+1. Of 
course, we also have to update LSPACE 
at this point, by adding to it the length 
which contains a pointer to a third 
group. The last three characters appear 
in the third group, followed by three 0 
characters. 

If a string is exactly six characters 
long, it appears in a single group, but the 
pointer itself contains 0. If a string is 
twelve, eighteen, twenty-four, etc, 
characters long, it appears in more than 
one group, but the pointer in the last 
group will contain 0. In general, the 
pointer in the last group always contains 
0, and it is this, rather than the presence 
of 0 characters, that determines the fact 
that it is the last group. 

We thus have one or more chains 
(sometimes called simple lists) which in- 
volve various eight-character groups in 
FREE. We are now in a position to make 
use of a basic idea in advanced program- 
ming techniques: the list of available 
space. In this case, the list of available 
space is a chain which contains all those 


eight-character groups, and only those 
groups, which are not on any other 
chain. That is, we think of all these 
groups as being in some order: the order 
is of no consequence. Then the first 
group, in this order, contains a pointer to 
the second group; the second group con- 
tains a pointer to the third, and so on, up 
to the last group, which contains a 0 
pointer. 

We use a list of available space 
because it is now no longer necessary to 
use a collapsing process, as described in 
connection with the previous string 
storage method. In particular, we are no 
longer “abandoning” anything, as we 
were before. All we have to do is to 
make sure that, at all times, every group 
into which FREE is divided is on some 
chain, either the list of available space, 
or a chain which represents the string 
value of some variable. 

There are also programs which use a 
list of available space, but in which 
some groups are abandoned, and a pro- 
cess somewhat like collapsing, known as 
garbage collection, is used to collect all 
these abandoned groups into a new list 
of available space. This, however, is 
necessary only when the various chains 
contain pointers to each other, which is 
not the case in the present application. 

By a pointer to a group, we mean a 
pointer to the first character in the 
group. Thus if K is such a pointer, then 
the group consists of FREE(K), 
FREE(K+1) and so on up through 
FREE(K+7). We will assume that 
FREE(K) through FREE(K +5) are the six 
characters in the group, and_ that 
FREE(K+6) and FREE(K+7), taken 
together, are the pointer to the next 
group. A variable called LAVS (for “list 
of available space”) contains, at all 
times, a pointer to the first group in the 
list of available space. The basic opera- 
tions on the list of available space are 
taking one group off of JOHNSON, or 7 
(plus 1, for the 0 character). 

The trouble with this method is that 
now SMITH is still in memory, together 
with its 0 character. We are not really us- 
ing all the space from SPACE(1) up 
through SPACE(LSPACE); there are five 
characters, plus a 0 character, that we 
are not using. By itself this causes no 
problems; but now consider what hap- 
pens as our program continues to run. 
Every time we have a variable with a 
string as its value, and this variable gets 
a new string as its value, we are going to 
“abandon” some of our string storage 
area, just as we did with SMITH in this 
case. Eventually, we are going to run out 


of space; the whole SPACE array will be 
used up, except for “abandoned” areas 
as above. What do we do next? 

Let us agree that, whenever we aban- 
don a string, we write a 0 character over 
the first character of that string. This 
character will immediately follow the 0 
character at the end of the preceding 
string, so that two 0 characters in a row 
will denote the start of an abandoned 
area. We can now consider the possibili- 
ty of moving all the strings backwards 
by just enough so that the abandoned 
areas disappear, as shown in figure 1. 
This is known as collapsing (or 
sometimes compactifying). If we think of 
the left side of figure 1 as a row of 
bricks, with spaces between them to 
tepresent the abandoned areas, then 
putting our hands on the two ends of the 
row and collapsing it would produce the 
situation shown in the right side of the 
figure. 

An algorithm to do this involves two 
pointers, | and J. As we move each 
character in SPACE, we set SPACE(J) = 
SPACE(1) and then add 1 to both | and J. 
When we have to skip over an aban- 

» doned areas, we increase |, but not J. 
Thus | always indicates the current 
character we are moving, and J always 

indicates the place we are moving it. At 

the start of the algorithm, both | and J 

are initialized to 1. 

There is still one difficulty. All our 
Variables with string values involve 
pointers, and after the collapsing pro- 

_ Cess has taken place, the pointers will be 
_ wrong. We must have some way of ad- 
__justing these pointer values. There are at 
_ least two reasonable ways of doing this. 
One of these involves what may be 
~ called back pointers. The first character 

_ (er possibly the first two characters) of 

_ €ach string, as given in the array SPACE, 

_ is now some indication of which variable 

has this particular string as its value 

_ (such as, for example, the address of that 

_ Variable). Whenever a back pointer is 

my Moved, by the operation SPACE(J) = 


_ tained in all the variables with string 
Values are placed in an array and sorted 
_ in ascending order, together with back 

Pointers to the given variables. As we 
_ are going through the SPACE array and 

Setting SPACE(J) = SPACE(I), we are also 

80ing through this new array, from the 
_ Peginning to the end. At each stage, the 
Currently considered pointer in this ar- 
ghay Points to the place in the SPACE ar- 


1A 1B 


Figure 1: Collapsing or “compactifying” 
an array. In figure 1a A, B, C and D are 
separated by empty space (shaded area). 
In figure 1b this empty, available space is 
consolidated by moving B, C and D up so 
that they are contiguous with A. 


=< 


ray that we will have to treat next, as the 
start of a string to be moved. When we 
get to this point in SPACE, we reference 
the associated back pointer and proceed 
as before; then we continue 
through the SPACE array, but also move 
forward by one position in the new ar- 
ray, so that we will be ready to treat that 
pointer when we come to it. 

Let us now pass to the second method 
of handling string values of variables. 
Again we use a large array, which we will 
call FREE this time, rather than SPACE. 
FREE is organized into groups of 
characters. To make our example con- 
crete, we will assume that each group is 
eight characters long. The first six of 
these characters are actually characters 
of the given string; the remaining two 
character positions, taken together, con- 
tain a pointer to another group of eight 
characters. 

Any string which is less than six 
characters long is stored in a single 
group. If a string is four characters long, 
for example, the last two characters are 
0 characters; this tells us that these are 
not actually to be counted as part of the 
string. A string which is more than six 
characters long is stored as a chain. 
Thus, for example, if a string is fifteen 
characters long, the first six of these 
characters appear in one group, which 
contains a pointer to another group. The 
next six characters appear in the front of 
this second group and add a new group 
to it. The first of these operations, 


121 


122 


removing a group from the list of 
available space, is performed as follows: 


@ Set K = LAVS; the new group will 
consist of FREE(K) through FREE 
(K+7). 

@ Since this group is no longer to be on 
the list of available space, the first 
group in this list is now what used to 
be the second group. But a pointer to 
this second group is currently in 
FREE(K+6) and FREE(K+7). This 
pointer now has to be taken and put 
into LAVS, because LAVS must con- 
tain, at all times, a pointer to the first 
group in the list of available space. 


The second of our two operation, add- 
ing a group to the list of available 
space, is performed as follows: 


@ Suppose that FREE(K) through FREE 
(K+7) is the new group. This will 
become the first group in the list of 
available space, and it must contain a 
pointer to the second group. But the 
second group is the old first group, 
and a pointer to that group was con- 
tained in LAVS. This means that LAVS 
must be moved into the pointer posi- 
tion FREE(K +6) and FREE(K +7). 

@ Since LAVS must contain, at all times, 
a pointer to the first group in the list 
of available space, we must now set 
LAVS equal to K. 


The first operation above can be 
modified to check for overflow. If it is 
performed when the list of available 
space contains exactly one group, it is 
not hard to see that LAVS will be set 
equal to 0. This is not in itself an error; it 
merely means that all available space is 
being used. The next time we do this, 
though, there will be an error unless we 
check for it. Therefore, when we set K = 
LAVS, we should check to see if K is now 
0; if so, there is an overflow condition. 
We are, of course, using the word 
“overflow” in a generalized sense to 
denote the fact that there is too much 
space being used for the available 
memory in the FREE array. 

Using these two basic operations, we 
can now make sure that our available 
space list is always kept up to date. Sup- 
pose that we have a variable J with a 
string value, and suppose that this string 
value is kept in m 8-bit groups. A pointer 
to the first of these groups will be kept 
in J itself. Suppose that we are now go- 
ing to set J to a new string value, which 


is kept in n 8-bit groups. First we apply 
the second algorithm above to the first 
group in the chain that represents the 
old value of J. This process puts this 
group on the list of available space. If m 
+ 1, that is, if the pointer in this first 
group was not originally 0, we apply the 
same process to the second group in the 
chain representing the old value of J, 
and so on through the rest of these 
groups. (It is not necessary to know m, of 
course; we merely test for the pointer 
being 0, which indicates the last group.) 
Now we take rn groups, or, in general, as 
many groups as we need, off the front of 
the list of available space by using the 
first algorithm above, and use these 
groups to store the new string value of J. 

This system is quite workable as it 
stands; the only real problems with it 
come when we try to extend it. Suppose, 
for example, that we want to set the 
string value of J equal to the 
current string value of I. In that case we 
might want to save quite a bit of time by 
setting the pointer in J to be the same as 
the pointer in I. Thus we would have two 
pointers to the same group, or to the 
first group of the same chain, in the 
FREE area. This scheme, however, will 
not work unless we change our setup a 
bit. The problem comes when the value 
of | is later changed to something else. 
In this case the old value of | is put back 
on the list of available space, and this is 
improper because it is still the current 
value of J. 

Now look at this case in more detail. 
Suppose that the value of | is ‘SMITH’, 
and we set J equal to ‘SMITH’ by setting 
J to point to the same place that | does. 
Suppose that we later set | equal to 
‘JOHNSON’. In this case, according to 
the algorithms we have discussed, the 
group [there is only one in this case; let 
us call it FREE(K) through FREE(K +7)] 
which contains ‘SMITH’ is put back on 
the list of available space, even though 
K is still the integer value of J. Now we 
need two groups to represent 
‘JOHNSON’. One of these will be this 
same group, that is, FREE(K) through 
FREE(K +7), because it was just put back 
on the beginning of the list of available 
space. This group will therefore contain 
JOHNSO, with the final N in the next 
group. This means that if at some later 
time we want to print out the value of J, 
we will print out JOHNSON rather than 
SMITH. 

One solution to this problem which is 
sometimes adopted is to reserve the first 
character of any string for a special in- 
teger telling us how many variables have 


this particular string as their value. This 
integer is known as a reference count. It 
is usually 1, but in the case above, where 
| and | point to the same string, it would 
e 2. Every time a variable is set to a 
new value, the reference count in the 
old value is decreased by 1. Only if its 
jalue is then 0 do we return the space it 
ses back to the list of available space, 
ecause otherwise there are all variables 
vhich have that string as their value. 
fhe trouble with this scheme is that it 
may very easily not be worth the effort. 
Jo we really want to add an extra 
sharacter to every string, not to mention 
he extra testing that goes on whenever 
ve set a string to a new value, just to be 
le to save a little time and space in an 


operation (eg: setting one string to be 
the same as another) that might not be 
that commonly used in our program? It 
is certainly a debatable point. 

It should also be clear that there is 
nothing special about the number of 
characters in a group — eight, in this 
case. The fewer characters we have in a 
group, the more pointers we will have, 
and the more space these will take up. 
The more characters we have in a group, 
the more wasted or 0 characters we will 
have in strings, because the length of a 
string is not always evenly divisible by 
the number of characters in a group. 
This is a space trade-off which should be 
tuned by the user to fit the requirements 
of a particular program. — 


123 


re 


/ If you have written computer pro- 
ams in any language, you must be 
jare by now what a subroutine is, 
although you might not have written 
ty. The basic concept of a subroutine is 
ent in all computer languages, 
hough every language implements it a 
differently from the others. In 
tems based on the 8080, the 8085, or 
Z-80, you write CALL SUB to call the 
broutine called SUB. On the 6800 and 
6502, it’s JSR SUB, while in BASIC 
; GOSUB @ where the first statement 
the subroutine SUB is on line number 
‘But regardless of the language, the 
cept is the same: you have some- 
g in your program that you want to 
more than once. It may be looking up 
}element in a table; it may be printing 
list; it may be making an access to 
jata structure; but whatever it is, you 
it at various times in your program. 

| don’t want to have to write out the 
ime instructions over again every time 
OU need that particular job to be done, 

ecause this is wasteful of memory 

. So, therefore, you group together 
instructions that do this job into a 

ubroutine, and then at any point that 

want the job to be done, you put in 

‘instruction to call the subroutine. 

the subroutine is finished, it 

irns to the point immediately follow- 
the place where it was called. This is 

done differently in different pro- 
ming languages — you write 

URN in BASIC, RET for the 8080 and 

, and RTS (ie: return from 

routine) for the 6800 and 6502. 


W D Maurer 


All this is fine if the job you want to 
do repeatedly is exactly the same every 
time you want to do it. But, in practice, 
this is usually not the case. For example, 
if you are looking up an element in a 
table, you are probably looking up a dif- 
ferent element each time. If you are 
multiplying two 16-bit quantities — a 
very common subject for a small-system 
subroutine — the quantities you are 
multiplying are probably not the same 
from one multiplication to the next, and 
the result is also probably not the same 
variable. This is true even though the 
logic of multiplication does stay the 
same. It is this that has led to the idea of 
subroutine parameters, the subject of 
this article. 


Parameters 


In applied mathematics, there is a 
concept of parameter which will be 
familiar to those small-system users who 
have backgrounds in engineering or 
physical science. Consider, for example, 
the graph of a function. You are usually 
expressing y in terms of x, but if you are 
constructing the graph of a circle, it is 
sometimes more useful to introduce 
another variable @ to represent the 
angle, and then to express both x and y 
in terms of 0. The variable @, in this con- 
text, is called a parameter. In computer 
programming, however, whether on 
large or small systems, the word 
“parameter” has a more general mean- 
ing, and one which does not require any 
knowledge of applied mathematics; it is 


125 


126 


simply any variable which is used by a 
subroutine, and which is supplied to that 
subroutine by the program that calls it. 

Parameters of subroutines are related 
to arguments (sometimes called 
Parameters or formal parameters) of 
functions. If you have a function f(t) or 
g(a, b) or h(x, y, z), then t, a, b, x, y, andz 
are the arguments. On a computer, the 
value of a function is computed by a 
subroutine, and this must be considered 
as one special kind of subroutine. Some 
languages allow you to use functional 
notation for functions; thus h(x, y, z) 
might be FNH(X, Y, Z) in BASIC, provid- 
ed that the definition of h was simple 
enough. In assembly language, however, 
one generally uses the same instructions 
(eg: Call, JSR, or whatever), whether one 
is calling a subroutine to calculate the 
value of a function or a more general 
subroutine. 

Those who work with big computers 
have laid out a considerable amount of 
terminology dealing with parameters 
and how they are supplied, or passed, to 
a subroutine by the program that calls it 
(and sometimes vice versa). One of the 
purposes of this article is to lay out this 
terminology for the small-system user so 
that he or she will not have to “reinvent 
the wheel.” It should be emphasized 
that for a long time mathematicians 
believed that there ought to be a single 
concept of parameter that would work 
well in all situations. Gradually we have 
come to realize that there are at least 
four, and probably a good deal more, 
reasonable implementations of 
parameter passing. These will be de- 
tailed in what follows. 


Two Examples 


To illustrate why the concept of 
parameter differs from one situation to 
another, let us consider two simple 
subroutines: an output subroutine and a 
multiplication subroutine. The output 
subroutine will be called OUTPUT(X), 
and its job will be to output the 
character X. The multiplication 
subroutine will be called MULT16(I, J, 
N), and its job will be to multiply the two 
16-bit quantities | and J, producing the 
result N. The problem we are to solve is 
how to call OUTPUT(Z), OUTPUT(Q), 
and so on, for various characters we 
wish to output, and similarly MULT16(A, 
B, C), MULT16(U, V, W), and so on, for 
various multiplications we wish to per- 
form. 

Consider first the case of the output 
subroutine. Suppose that in this 


subroutine there is a variable called X. In 
order to output Z, for example, we move 
Z to X just before calling OUTPUT. The 
same sort of thing will work for Q, or any 
other character we wish to output. This 
method of passing parameters is known 
as call by value. It may be defined more 
formally as follows. Suppose we have a 
subroutine such as OUTPUT(X), where X 
stands for any parameter, such as Z or 
Q, that might actually be supplied. Here 
Z and Q are called the actual 
parameters, and X is called the formal 
Parameter. Then call by value consists 
of: 


1. Moving the value of the actual 
Parameter’ to the formal 
parameter. (If there is more than 
one formal parameter, as in the 
case of a function h(x, y, z), then 
they must all be moved.) 

2. Calling the subroutine. 


In assembly language it is very com- 
mon for X, in a situation such as the 
above, to be a register. Then all we have 
to do is to load the register before.we 
call the subroutine; the subroutine 
assumes that Z, or Q, or whatever stands 
for X, is in that register. (On the 8080, the 
Z-80, the 6800, and the 6502, the most 
common register used for this purpose is 
the A register, although ISIS, the 
Operating system for the Intellec, which 
is an 8080-based system, uses the C 
register.) 

If we now look at MULT16, however, 
we can see without too much trouble 
that call by value does not work. Let us 
see why not by laying out a specific ex- 
ample. Suppose we are calling 
MULT16(U, V, W), where MULT16 has 
been defined as a subroutine with 
parameters I, J, and N. That is, |, J, and N 
are the formal parameters. To use call 
by value, we would first have to move 
the values of U, V, and W into I, J, and N. 
That is, U would be moved to 1; V would 
be moved to J; and W would be moved 
to N. Now we would call the subroutine; 
and the subroutine, we are assuming 
multiplies the 16-bit quantities | and J 
and sets N equal to the result. 

What is wrong with this? Since we 
were calling MULT16(U, V, W), what we 
presumably wanted was to multiply the 
two 16-bit numbers U and V, and set W 
equal to the result. It is not too hard to 
see that we did, actually, multiply U by 
V, because we set | equal to U, and J 
equal to V, and then we multiplied | by 
J. But what happens to W? We set N 
equal to the result of multiplying U by 


ut we did not set W equal to 
ything. (Earlier, we also set N equal to 
a useless operation.) The general 
luation here is that whenever we have 
ormal parameter that is set to some 
w value by a subroutine, call by value 
[I not work; the formal parameter will 
ot be set to the new value or to any 
her new value. 
Because of this, people who work 
big computers came up with three 
ternative methods of passing 
ameters. The first of these is known 
_ call by value and result or, 
pmetimes, informally as ‘‘copy- 
ore.” The second is known as call by 
ence or sometimes as “call by ad- 
ess’ or “call by location.” The third is 
nas call by name. We shall take up 
sh of these in turn. 


H by Value and Result 


call by value and result is a rather 
aightforward way of fixing the bug in 

by value that should be evident 
m the preceding discussion. In fact, 
we wanted to do in our MULT16 
tine was as follows: 


Set | equal to U and J equal to V. 
Call the subroutine (which 
multiplies | by J, giving N). 

Set W equal to N. 


other words, there are two 
eter-passing operations — one 
before the subroutine starts, the sec- 
One after it ends — and one is the 
verse of the other. In the first opera- 
we move actual parameters to for- 
Parameters. In the second opera- 
we move formal parameters to ac- 
/ parameters. The parameters we 
ve the first time are the ones that are 
d by the subroutine; the parameters 
ove the second time are the ones 
are set by the subroutine. 
it how can we tell which parameters 
d and which ones are set? It won’t 
ys be the case that the first two are 
id and the last one is set (if there are 
e altogether). They might all be 
¢d, or two of them might be set, or any 
imber of possible combinations. 
fain, there is more than one 
sonable solution to this problem. 
@ solution chosen by the designers 
@ number of computer languages in 
Je-spread use by the American 
litary establishment (NELIAC, 
‘IAL, CMS-2) was to build the distinc- 
| between used and returned 


parameters into the syntax of the 
language. In other words, when you call 
a subroutine in any one of these 
languages, you would have to specify, in 
some way, which of these you intended 
to be used and which you intended to be 
returned. (JOVIAL, for example, uses a 
semicolon; we would speak of 
MULT16(U, V; W), for example, where 
the semicolon separates the used 
parameters U and V from the returned 
parameter W.) This certainly solves the 
problem, although only if you are going 
to use call by value and result, at the 
cost of making life a trifle more com- 
plicated for those who do not want to 
have to worry about how parameters are 
passed. 

The other solution, chosen by IBM, is 
to regard all parameters as both used 
and returned at all times. This may seem 
a bit wasteful, but in fact, compared to 
call by reference (to be described 
below), it is more efficient, most of the 
time. It does, however, lead to some 
strange and unusual results, the most 
famous of which may be illustrated as 
follows. Suppose we have a subroutine 
D(X, Y), where X and Y are the formal 
Parameters, and suppose that this sets X 
to 0 and does not change Y. Now sup- 
pose that we call D(L,L). Of course, we 
would like this to set L equal to 0. But 
see what happens: 


1. Since X and Y are treated as both 
used and returned, our first step is to set 
X equal to L, and Y equal to L. 

2. Now we call the subroutine, which 
sets X equal to 0 and does not change Y. 

3. Finally, we return the actual 
parameters. First we return X by setting L 
equal to X. Since X is now 0, this will set 
L equal to 0, which is exactly what we 
wanted. But now we return Y by setting L 
equal to Y. Since Y is still the original 
value of L, this will undo the previous 
result, and the final outcome will be that 
L is the same after calling D as it was 
beforehand! 


The behavior illustrated above can be 
avoided simply by setting L1 equal to L 
and then calling (DL, L1), rather than 
D(L, L). In general, when using call by 
value and result, with all parameters 
used and returned, one should never use 
two actual parameters which are the 
same. The problem above actually hap- 
pened to one of my students, who wrote 
a big FORTRAN program that ran on the 
CDC 6400, a computer using call by 
reference, but mysteriously failed to run 
on the IBM 360, a computer using call 


127 


128 


by value and result. Many hours of 
analysis traced the bug to a subroutine 
call like D(L, L) above. 


Call by Reference 


Call by reference, historically, preced- 
ed call by value and result, although it 
was not known by that name at that 
time. The idea of call by reference is to 
give the subroutine the addresses of its 
parameters, rather than their values. 
Then, when the subroutine either uses or 
sets one of its formal parameters, it does 
so by making a reference to that address. 
Let us see how this would work on a 
small system. 


1. On the 8080, you can load the HL 
register pair with the address of the 
parameter a with the instruction LXI H, 
a just before calling the subroutine. 
Then, in the subroutine, if you need to 
load this parameter into any register r, 
you can use MOV 1,M; if you need to 
Operate on it arithmetically, you can use 
ADD M, SUB M, ANA M, and the like; if 
you need to set it to a new value which 
is now in register r, you can use MOV 
M,r. If you need the HL register pair for 
other purposes in your routine, you can 
do an XCHG if you don’t need the DE 
register pair, or you can PUSH H while 
you use HL and POP H afterward. If 
there are two parameters, you can load 
one into HL with LXI H, a as before, and 
load the other one into BC or DE. If 
there are several parameters, you can 
push their addresses onto the stack 
before calling the subroutine, and pop 
them back within the subroutine. 

2. On the 6800, you can load the X reg- 
ister with the address of the parameter 
a with the instruction LDX # a (where the 
# specifies an immediate addressing in- 
struction) just before calling the 
subroutine. You can now use indexed ad- 
dressing instructions to manipulate the 
parameter by loading it (LDAA 0,X or 
LDAB 0,X), storing it (STAA 0,X or STAB 
0,X), or performing arithmetic opera- 
tions such as ADDA 0,X or ANDB 0X. If 
there is more than one parameter, you 
can move the addresses of all the actual 
parameters to fixed locations within the 
subroutine before calling it. The 
subroutine can then load each of these 
into the X register when needed, after 
which any of the indexed instructions 
discussed above may be used. 

3. On the 6502, there is a general 
method involving loading the X register, 
just before calling the subroutine, with 
the address of a table of addresses of ac- 


tual parameters. That is, we execute JDX 
#a where we have written (in page 0): 


a  DFB U MOD 256 


DFB U/256 
DFB V MOD 256 
DFB V/256 
DFB W MOD 256 
DFB W/256 


for exmple, defining a byte for the low- 
order address and then for the high- 
order address of each of the parameters 
U, V, and W. One can then make 
reference to the actual parameters by in- 
dexed indirected addressing: LDA (0,X) 
for the U, LDA (2,X) for V, and LDA (4,x) 
for W. This is perfectly general, since 
LDA (load) can be replaced by STA 
(store), ADC (add with carry), CMP (com- 
pare), AND, and so on. 

4. On the Z-80, you can (as always) 
mimic the 8080, or you can use registers 
IX and IY to contain the addresses of 
parameters. 


An additional advantage of call by 
reference is that it allows you to have, as 
a parameter, the name of an array. For 
example, you might be writing a 
subroutine to compare two character 
strings to see if they are the same. There 
would be two parameters, namely the 
two character strings. If you used call by 
value, you would have to move these en- 
tire strings into new locations just before 
calling the subroutine. This would be 
wasteful of both time and space, and is, 
in fact, never done; even systems that 
use call by value or call by value and 
result, if they allow array names as 
parameters, use call by reference (or call 
by name, to be discussed below) for 
these. Thus you would only be passing, 
from the program to the subroutine, the 
two string starting addresses; that is, for 
each string, the address of its first byte. 

One important source of confusion, 
when call by reference is used, has to do 
with how to return a parameter. A large 
number of programmers try, when they 
are writing a subroutine, to have it put 
its answer somewhere and then furnish 
the main program with the address of 
where that somewhere is. This never 
works, because the main program has no 
way of using that information. It is not 
up to the subroutine to tell the main pro- 
gram where the information is to be 
returned; it is up to the main program to 
tell the subroutine where to return the in- 
formation, and then the subroutine must 
return the information to that point. In 
particular, the subroutine will never be 


right if it returns a subroutine. If call by 
reference is used, it should be 
remembered that this subroutine can be 
called more than once, with different ac- 
tual parameters each time, and 
- therefore, when it changes the value of 
one of its actual parameters, that 
change must be made by storing this 
new value in an indexed location — 
where the index is normally the HL 
register pair on the 8080, the X register 
on the 6800 and 6502, and possibly the 
IX or IY register on the Z-80. 
Call by reference is, in general, more 
inefficient than call by value and result, 
’ particularly if we make reference to a 
parameter inside a loop. One technique 
hat has been tried on big computers, 
and works rather well for subroutines 
hat take large amounts of time, is ad- 
dress modification. This involves storing 
the addresses which are passed as 
parameters directly into the instructions 
‘that use them. Unfortunately, this 
) technique is inappropriate in most 
microcomputer systems, where the in- 
tructions are in read-only memory and 
hus cannot be modified as the program 
is running. It should also be mentioned 
that on some systems which use both 
_ call by reference and call by value and 
result, the second of these is im- 
‘plemented as a special case of the first. 
That is, it is always the addresses, or 
ferences, that are passed, so that there 
is only one kind of standard subroutine 
rotocol rather than two. But whenever 
all by value and result is to be used, the 
' subroutine, rather than the main pro- 
) gram, performs the setting of formal 
| | Parameters to actual parameter values 
and vice versa. 


all by Name 


This finally brings us to call by name 
— the easiest to define, yet the hardest 
io understand, of the better known 
Parameter passing methods. For years, 
call by name was a pons asinorum 
mong big computer software people; 
that is, the way of distinguishing the 
right from the dumb, or the “with-it” 
tom the “not-with-it,” was whether you 
nderstood call by name. Lately there 
has been a bit less interest in call by 
- name among practical computer peo- 

‘ple, since, although it was used in 
_ ALGOL 60, one of the first big computer 
q languages (in both senses — big [com- 
_ puter languages] and [big computer] 
' languages), it has not been used in most 
languages developed since then. But an 
understanding of it, and of some of the 


problems that arise with it, is still essen- 
tial to the amateur as well as the profes- 
sional computer scientist. 

Call by name is defined as follows. 
Suppose [ have a subroutine with a for- 
mal parameter X. Suppose | call this 
subroutine, with actual parameter Y. 
Then call by name implies that the 
subroutine is executed as if we had gone 
through it and substituted Y for every 
occurrence of X. 

There is one important proviso to the 
above, which may be illustrated as 
follows. Suppose that in the subroutine 
we have A=B+X. Suppose now that the 
actual parameter is not Y, but rather 
U+V. (It is quite permissible to call 
SUB(U+V), for example, where SUB is 
the name of a subroutine.) Now we 
would like to proceed as if A=BeX real- 
ly means A=Be(U+V); but if we 
substitute U+V for X, as in the above 
definition, we obtain A=BeU+V, which 
is not quite the same. Therefore we need 
to change the definition so as to specify 
the insertion of parentheses. On the 
other hand, it should also be clear that 
we do not want to insert parentheses all 
the time. For example, the variable A 
could have been the formal parameter, 
rather than X. In this case, the actual 
parameter could not be U+V, because 
then A=BeX would be interpreted as 
(U+V)=BeX, which makes no sense. 
But suppose the actual parameter is Y, 
just as before; we still do not want to 
write (Y)=BeX (with parentheses) in 
BASIC or any other algebraic language. 
Therefore the rule is that the actual 
parameter is substituted for the formal 
parameter, inserting parentheses 
wherever syntactically possible; this is 
the phrase used in the definition of 
ALGOL 60. 

So long as the actual parameters are 
not expressions like U+V (or like A(\), 
which could be either a subscripted 
variable or a reference to a function), 
call by name is almost identical to call 
by reference. Therefore, in studying the 
differences between the two, we have to 
look at the general rules for handling ac- 
tual parameters which are expressions. 
These are that an actual parameter can- 
not be an expression (other than a single 
variable, either subscripted or not) when 
the corresponding formal parameter is 
returned, as we have illustrated above 
with the formal parameter A and the ac- 
tual parameter U+V; and, of course, a 
formal parameter can never be an ex- 
pression. 

Suppose that in our subroutine we 
have S=S+X, where X is a formal 


129 


130 


parameter, and the corresponding ac- 
tual parameter is A(I). (This is a 
simplification of an actual example 
given with the definition of ALGOL 60.) 
Therefore S=S+X becomes $=S+A(I). 
But now suppose that we want to do this 
for 1=1 to 10. That would be, 
presumably, a way of adding the 
numbers A(1) through A(10), if S were 
originally set to 0. If we use call by 
reference, however, this will not work. In 
call by reference, the address of the ac- 
tual parameter, in this case, the address 
of A(), would be given to the subroutine. 
When the subroutine does S=S+X, it 
would get X from the location which has 
that address. But that location is a con- 
stant location — the location, in fact, of 
A() where the variable | has whatever 
value it had before the subroutine was 
called. This means that we add X ten 
times, whatever X is, and in this case we 
add the same value of A(I) ten times, 
rather than adding A(1) through A(10). 
How would we implement call by 
name? In the above case, when the 
subroutine does S=$+xX, it has to have 
a way of finding out whether X will stand 
for a different variable each time. 
Therefore it loads $ and then calls a 
subroutine to find the value of X, which 
it then adds and stores the result in S. 
This means that it is the address of the 
start of this subroutine that is passed, 
rather than the address of X itself as in 
call by reference. (This is known as 


Jensen's device, after a programmer at 
Regnecentralen, or the National Com- 
puter Center of Denmark, who used it in 
implementing ALGOL 60.) We should 
remark that there is another entirely dif- 
ferent way of implementing call by 
name, which is to replace each call to a 
subroutine, separately, by the 
subroutine with the substitutions per- 
formed as discussed above. This will not 
work for ALGOL 60, because it will not 
work, in general, for recursive 
subroutines, and it also takes up quite a 
bit of space if the subroutines are long. 

Call by name is considerably less effi- 
cient than the other methods we have 
discussed, which is a big reason for its 
general decline. Nevertheless, it has its 
own unexpected advantages. Let us con- 
sider a subroutine like D(X, Y), which we 
discussed earlier, but this time suppose 
that it simply uses X and does not use Y, 
and let us call D(A, F(B)), where F(B) is a 
reference to a function. Suppose further 
than that calculation of F(B) (for some 
reason) gets the computer into an 
endless loop. If we use call by name, 
then, since we never use Y, we have no 
occasion to call the subroutine that 
calculates Y — that is, we never call 
F(B). If we use call by value, however, 
the first thing we do is to set X equal toA 
and Y equal to F(B). The result is that we 
get into the endless loop, in this case, if 
we use call by value, but not if we use 
call by name. 


Hashing, or scatter storage, is a well- 
known and widely used technique for 
“handling lists. Perhaps the most com- 
mon usage is in assemblers and com- 
‘pilers where it greatly speeds the han- 
‘dling of symbols. This article briefly 
‘discusses the merits and drawbacks of 
hashing relative to other sorting and 
searching techniques and presents an 
@asy-to-use hashing function im- 
‘plemented on a 6800 microprocessor. 

_ The concept of hash tables first ap- 
eared in the literature around 1953, but 
it is generally accepted that hashing was 
used prior to that. Other names given to 
the same process are scatter storage, 
_ fandomized storage and key transforma- 
‘tion table. These names will be seen to 
be equally applicable shortly. 

Using the hashing technique, a sym- 
bol (ie: collection of alphanumeric 
_ characters) to be put in the table is pro- 
cessed through a hashing function to ob- 
tain an index into a storage table. This 
index is then used for the address of a 
" potential storage space for that symbol. 

We say potential because it is possible 
_ that some other symbol could have 
previously hashed to the same location. 
_ Such an occurrence is called a collision 
and the current symbol must be 
"reprocessed to generate a new table ad- 
dress which is again checked for being 
_ empty and so on until an opening is 
found. 

When it is necessary to look up the 
value of a symbol a process similar to 
_ that above is performed. The symbol is 
Processed through the same hashing 


Easy-to-Use 
Hashing Function 


Don Kinzer 


function as before. Next the address is 
checked to make sure that it is not emp- 
ty. If it is empty, the symbol is unde- 
fined. Now that we know a symbol is 
stored there, we must check if it 
matches the symbol we are looking for 
because this may be a collision. If the 
symbols do not match, we have to 
rehash just as before until we find the 
symbol or an empty location. 

With a given set of symbols, a given 
hashing function and a specified table 
length, it is possible that trying to insert 
a particular symbol into the table will 
result in an infinite number of collisions 
indicating no empty spaces even though 
the table is not full. Likewise, another 
symbol may take many attempts before 
being finally inserted. 

It should be quite obvious that the 
ideal case would be an infinitely long 
table space. However, a real-world com- 
promise dictates that we waste a percen- 
tage of the table to keep the number of 
rehashes low. The trade-off is very evi- 
dent. The lower the percentage of table 
utilization, the lower will be the number 
of collisions. As the percentage of table 
utilization increases, so will the number 
of collisions. Furthermore, the number 
of collisions, and therefore the number 
of rehashes, directly affects execution 
time. We encounter the memory size 
versus speed trade-off once again. In 
practice, a reasonable compromise is to 
try for 50% to 80% table utilization and 
to determine empirically the hash count 
(ie: number of rehashes allowed). If the 
hash count is exceeded on a symbol in- 


131 


Listing 1: The assembly listing of the hash function and random-number 
generator. If the first hash of a label does not work, the routine is 
entered a second time through the REHASH function. The random- 
number generator generates a 24-bit random number which is used to 
determine the table location of the label. 


* 
* 
* STORAGE 
* 
0020 ORG $20 
0020 RNDM = RMB 3 
0023 TELADR RMB 2 
* 
* 
1000 SYMTBL EQU $1000 
* 
* 
0100 ORG $100 
* 
* 
ok HASH 
X THIS ROUTINE PROCESSES THE SYMROI. POINTED TO RY Xx 
x AND RETURNS THE ADDRESS OF A SYMROL LOCATION OF 
% THE 4K SYMBOL TABLE IN THE X REGISTER. 
x IT IS UP TO THE CALLING ROUTINE TO CHECK THE 
* VALIDITY OF THE SYMBOL LOCATION, 
* 
bs REHASH IS THE ENTRY POINT IF A RETRY IS NECESSARY 
%* THE EB REGISTER CONTAINS THE HASH COUNT (THE 
* NUMBER OF TIMES HASH + REHASH HAVE BEEN CALLED). 
* 
0100 SF HASH CLR B SET HASH COUNT TO oO 
0101 Ab 00 LDA A OK GET FIRST CHAR 
0103 AB OS ADD A SX FOLD THE SYMBOL TO 3 RYTES 
0105 97 22 STA A RNDM+2 
0107 Ab O1 LOA A 19x 
0109 AP 04 ADC A 49x 
0108 97 21 STA A RNDM+L 
010D As 02 LDA A 29x 
O10F Ad 03 ADC A 39x 
0111 97 20 STA A RWTM FOLD DONE 
* 
* 
0113 SC REHASH INC B UP THE HASH COUNT 
0114 BD 01 2¢ JSR RANDOM GENERATE RANDOM BITS 
0117 96 22 LDA A RNDM+2 GET A BYTE OF RANDOM 
O119 94 OF AND A $0F MASK OFF 4 
O11B 36 PSH A SAVE TILL LATER 
O11C 96 21 LDA A RNDM+1 GET ANOTHER BYTE 
O11E 94 FB AND A $FB MASK OFF S (9 BITS TOTAL) 
0120 8B 00 ADD A #SYMTBL ann TO LS HALF 
* OF SYMBOL. TABLE ADDRESS 
0122 97 24 STA A TRLADR+L SAVE 
0124 32 PUL A GET FIRST BYTE BACK 
0125 89 10 ADC A 4#SYMTBL/256 AND TO HS HALF OF 
* SYMBOL TARLE ANDRESS 
0127 97 23 STA A TRLADR SAVE 
0129 DE 23 LDx TBLADR GET ENTRY ANDRESS 
0128 39 RTS DONE 
* 
* 
** RANDOM 
X THIS ROUTINE RETURNS @ 24 BIT RANDOM 
x NUMBER IN RNDIM THROUGH RNIM+2 WHICH MUST RE 
* NON-ZERO UPON ENTRY. THE ROUTINE MAKES 24 
* PASSES TO INVOLVE ALL BITS OF THE SEED IN 
* THE RESULT. 
0120 36 RANDOM PSH A 
O12n 37 PSH B SAVE REGS 
O12E Cé 18 LDA B #24 GET LOOP COUNT 
0130 96 20 RNDLP LDA A RNIM GET MS BYTE 
0132 49 ROL A 
0133 49 ROL A 
0134 49 ROL A 
0135 49 ROL A 
0136 49 ROL A 
0137 98 20 EOR A RNDM XOR BIT 18 WITH 23 
0139 49 ROL A PUT RESULT IN CARRY 
013A 79 00 22 ROL. RNDM+2 SHIFT IT IN 
0130 79 00 21 ROL RNDM+1 PROPAGATE ACROSS 
0140 79 00 20 ROL RNDM ALL THE WAY 
0143 5A DEC B FASS DONE 
0144 26 EA BNE RANDLE LOOF TILL DONE 
0146 33 PUL E 
0147 32 PUL A RESTORF REGS 
0148 39 RTS DONE 
* 
* 
* 
END 
NO ERROR(S) DETECTED 
SYMBOL TABLE: 
HASH 0100 RANDOM 0120 REHASH 0113 RNLLFE 0130 
RNDM 0020 SYMTBL 1000 TELADR 0023 


132 


sertion operation, the table is declared 
full, but on a symbol retrieval operation 
the symbol is declared undefined, 

When the table size and hashing func- 
tion are selected appropriately, the 
average number of hashes is generally 
less than /og.n, where n is the number of 
symbols in the table. Compare this to a 
linear search which averages n/2 com- 
parisons. An average assembly-language 
Program will contain about 100 labels 
and symbols. Hashing would average 
about seven collisions while a linear list 
would require about fifty comparisons 
on the average. 

The crux of the hashing matter is find- 
ing a good hashing function which will 
minimize collisions. The procedure for 
this usually involves some complex 
mathematical analysis based on the 
characters expected in the symbols and 
their relative frequency of occurrence. 
The optimum hash function generally 
ends up being division by certain prime 
numbers or some other equally awkward 
scheme for a microprocessor. 

As an alternative to this, | offer an em- 
pirically determined hashing function 
that works well within the confines of an 
assembler. The reason for using it, 
however, was logically derived and is 
something like this: it is the purpose of 
hashing to randomly distribute symbols 
about a table, so why not use a random- 
number generator to generate the table 
index? , 

The random-number generator, RAN- 
DOM, and the HASH routine used are 
shown in listing 1. The random-number 
generator uses the maximal length se- 
quence generator technique to generate 
a 24-bit random number. With a nonzero 
initial state in the 3 bytes, each call to 
RANDOM will leave them in a specific 
final state. Different initial states pro- 
duce randomly different final states. 

The HASH routine merely loads the 3 
bytes with the symbol and calls RAN- 
DOM to generate a random bit se- 
quence. The assembler for which HASH 
was written allowed six character sym- 
bols. In order to utilize every bit of sym- 
bol information to hash to an address, 
the six-character symbol is crammed in- 
to 3 bytes by “folding” it in half. This is 
done by adding the outermost bytes (ie: 
characters) together for 1 byte of the 
random-number generator followed by 
adding together the next outermost two 
characters and lastly the innermost two. 
This can be done without losing informa- 
tion because the ASCII characters of the 
symbols have a hexadecimal value less 
than 7F. Two of these added together 


have a value less than hexadecimal FE 
which fits in eight bits. 

_ The HASH routine in listing 1 assumes 
a 4 K-byte symbol table limitation. With 
bytes for the symbol name and 2 bytes 
for its value this allows 512 symbol 
spaces. This being the case, only 9 bits 
are needed for a table index. Since the 
‘result of the call to RANDOM is 24 ran- 
dom bits, we are perfectly free to 
choose any 9 of those bits for the index. 
‘HASH does this by taking out the most 
significant 9 bits of the least significant 
12 bits of the generator. 

Recall that HASH only returns a 
potentially useful table address. In the 
case of a label insertion operation it is 
_up to the calling routine to check that 
he returned address is empty. If not, 
REHASH is called which utilizes the last 
‘contents of the random-number 
"generator as a seed for the next random 
number. Calling HASH again will pro- 
luce exactly the same address. Alter- 
nate means of handling collisions such 
as linear of quadratic distribution will 
not be discussed here. 

In the case of a label-retrieve opera- 
_tion the calling routine needs to check if 
‘the symbol matches that at the table ad- 


dress. If they do not match and the loca- 
tion is not empty, REHASH until the 
hash count limit is exceeded or an emp- 
ty location is found whereupon the sym- 
bol is declared to be undefined. Note 
that it will take exactly the same number 
of attempts to find a symbol as it did to 
put it in the table to begin with. 

This has by no means been a thorough 
treatment of the subject of hashing but 
only an attempt to pass on something 
which works rather well in my _ ex- 
perience. The interested reader is en- 
couraged to do further research into the 
topics mentioned here. m 


References 


1. Grappel R. Randomize Your Programming. 
BYTE, volume 1, number 13, September 
1976. 

2. Hopgood, F R A. Compiling Techniques. 
American Elsevier, 1970. 

3. Knuth, Donald E. The Art of Computer Pro- 
gramming, Volume 3, Sorting and Searching. 
Addison Wesley, 1973. 

4. Lancaster, D. ‘‘Understanding Pseudo- 
Random Circuits." Radio Electronics, April 
1975. 


133 


L Peterson 


A continuing problem on any com- 
uter system is storage. There is never 
nough computer memory for all the in- 
ormation we wish to store. This is true 
oth for programs in main memory and 
or the information which resides on 
eripheral devices. 

One solution to this problem is simply 
o buy more memory. Particularly in the 
ase of storage devices with removable 
ledia, such as cassettes, floppy disks, 
fagnetic tape and even paper tape, ad- 
ional media can be purchased and 
as necessary. But even here 
conomics will eventually limit the 
amount of storage available. 

An alternative approach is to try to 
e better use of existing storage 
edia. This is where text compression 
be of great use. The idea of text 
sompression is to reduce the amount of 
pace needed to store a file by com- 

essing it, making it smaller. Compres- 
J is accomplished by changing the 
way in which the file is represented. The 
coding procedure is performed in 
uch a way that it is reversible; that is, it 
can later be decoded to produce the 
Original uncompressed file. This is il- 
lustrated in figure 1. The hope is that the 
encoded version of the file will be 
aller than the original file, hence 
Space will be saved. 

The cost of this space saving is pro- 
cessor time. Additional processor time 
will be needed to encode and decode 
i compressed files as they are pro- 
sed. However, it should be noted that 
nicroprocessors are seldom processor 


‘will be less on a compressed file despite 
‘its encoded form. This is because the in- 


ext Compression 


put/output (I/O) transfer time for a com- 
pressed file is less than the transfer time 
for an uncompressed file, since there are 
fewer bits to read or write. Therefore, 
1/O bound programs, like assemblers 
and loaders, may execute faster on com- 
pressed files. 

The basic idea of text compression is 
to find an encoding method that takes 
up minimal space. Many algorithms for 
text compression have been invented, 
and we present some of them here. In 
general, these algorithms will work for 
any type of data, such as numeric, 
character string, and so on; but for pur- 
poses of this article we limit ourselves to 
text (ie: strings of characters). This will 
include programs, documentation, mail- 
ing lists, data, and many other files 
stored in computers. In fact, object pro- 
grams, if considered as simply strings of 
bytes, can also be compressed, although 
this must be done carefully. 

Text compression is accomplished by 
careful selection of the representation of 
the information in the compressed file. 
For many small computer systems, the 
ASCII code is generally used to repre- 
sent characters. The main advantage of 
the ASCII code is that the representation 
is standard and easy to define. A major 


ENCODING 


ORIGINAL 
FILE 


DECODING 


Figure 1: The text compression process. 


ARR Ok RR 


eR CR RK 
’ 
xx / \n« 
ah /***\ 


went \ aR RR RR 


aenne\ . 0 RRR RR 


\ seas 4 tk \re/ 
fe ween |, 


\ eee N 
\ 


/ /*e*\ 
wg -/ [teereee\ 


Figure 2: A file which can benefit from 
simple text compression techniques. The 
original file is a 24 by 80 character video 
display image consisting of 1920 char- 
acters. Deleting trailing blanks and using 
tabs set for every eight columns will 
reduce the size of this file to 412 
characters — a savings of 78.6% 


disadvantage is its poor space utiliza- 
tion. ASCII is a 7-bit code, while most 
processors handle 8-bit bytes. Thus, 1 bit 
out of 8 (ie: 12.5%) is wasted simply 
because a 7-bit character code is used in 
an 8-bit byte. Further, most control 
codes are seldom used, and many ap- 
plications do not need both uppercase 
and lowercase characters. Thus, another 
bit can generally be reclaimed with 
ease, providing at least 25% savings in 
storage space. Many of the algorithms 
presented here can turn these extra bits 
into even greater savings of space. 
Notice, however, that this approach 
requires a description of how the com- 
pressed file is to be represented. This 
description commonly consists of the 
encoding and decoding routines. The 
savings which result from text compres- 
sion must be balanced against both the 
additional processor time for encoding 
and decoding, and the storage space 


necessary for the encoding and 
decoding routines. Also, different types 
of files may be best encoded by dif- 
ferent methods, so several different en- 
coding and decoding routines may be 
necessary. 


Trailing Blanks and Tabs 


A simple approach to compression for 
text files, but not for object code files, is 
eliminating blanks which come at the 
ends of lines before the Carriage return 
and line feed characters. These are 
known as trailing blanks. For systems 
which store large amounts of assembly 
language, BASIC or FORTRAN pro- 
grams, much of each line will be blank. 
Any trailing blanks can be deleted 
without changing the meaning of the 
file. 

Tabs can be used to reduce the 
number of blanks elsewhere in a line. 
Particularly with block structured pro- 
grams, such as ALGOL, Pascal, or PL/I, 
or with column-oriented languages such 
as FORTRAN or assembly language, tabs 
can be quite effective in text compres- 
sion. Two varieties of tabbing 
mechanisms can be used. One is called 
tixed tab stops. In this case, tab stops oc- 
cur every n columns, where n is a 
system-wide constant. Typically n=8, 
although some studies have shown that 
n=4 or n=5 will produce additional 
savings. * 

The other possibility is to use variable 
tab stops. In this case, tab stop positions 
are selected for each file separately. 
This requires a decision to select which 
tab stops would produce the best com- 
pression. In addition, it would be 
necessary to indicate with each file what 
tab settings are to be used. This can be 
done easily by appending a tab stop dic- 
tionary at the head of each file. Such a 
dictionary would be used to initialize 
tables for the decoding routine which 
would replace each tab with an ap- 
propriate number of blanks. This ap- 
proach allows different tab settings to 
be used for different programming 
languages or data sets. 


Multiple Characters 


Trailing blanks and tab mechanisms 
are used for compressing strings of 
multiple blank characters. Some ap- 
plications may result in strings of iden- 
tical nonblank characters occurring fre- 
quently. For example, picture processing 
by computer often requires storing long 
sequences of identical characters, such 


the characters which produce figure 
. The approach here is to replace a 
ring of n identical characters by the 
umber rn and 1 character, thus saving 
2 characters. The count can be 
epresented as a byte. If the count ex- 
eeds 256, it can be output as a count of 
56 followed by the character, and then 
nother count and character for the re- 


Encoding consists of simply counting 
‘identical characters until a different one 
is found, and then outputting the count 
ind character. Decoding simply expands 
ach count and character to the ap- 
ropriate number of characters. 
Obviously, n should be greater than 2 
most of the time for this approach to 
ucceed. If n were generally 1, this ap- 
‘oach would actually double the size 
_ of the file. Since this is commonly the 
case for text files, a more sophisticated 
‘approach is generally used. 

_ We wish to replace sequences of iden- 
characters by a count and 
haracter, but leave single or double 
‘characters alone. The problem is 
“representing the multiple characters in 
uch a way that the count is not 
isinterpreted as a character. A com- 
mon solution is to use an escape 
“sequence, which is a means of indicating 
hat a special interpretation should be 
applied to the characters which follow. 
‘To create an escape sequence, choose 
‘any character which is seldom’ and 
preferably never used. For example, in 
ASCII one of the control codes or 
' special characters might be used. ASCII 
“even provides an escape character, but 
if it is already being used for another 
' purpose, any other character code can 
be used. Now a sequence of n identical 
characters would be represented by the 
escape character, the value n, and the 
character to be repeated. Figure 3 shows 
the text of figure 2 compressed by this 
method. 

This allows normal text to be 
represented normally, except for the 
escape character. The problem we must 
now solve is how to represent the escape 
character if it occurs in the input (ie: un- 
compressed) text. If we simply copy it to 
the compressed file, the decoder will in- 
correctly think it is the start of an escape 
sequence and interpret the following 
two characters as indicating a sequence 
of identical characters. This is essential- 
ly the same problem that language 
designers face in trying to represent a 
quoted string consisting of a quote. 
Several approaches to this problem can 
be used: outlaw all occurrences of the 


escape character; replace all escape 
characters by a special escape sequence 
such as one with a 0 count; replace all 
escape characters by an escape 
character, a count of 1, and an escape 
character, treating it as a sequence of 
length 1. Any of these approaches will 
allow a file to be encoded and decoded 
easily and correctly. 

Note that in choosing an escape 
character, we can always use the same 
one (ie: a system-wide constant) or we 
can select a different one for every file. 
If we choose a different one for every 
file, we must make a preliminary pass 
through the file to look at all characters 
used and find one which is not used. We 
should then append the escape 
character at the beginning of the file to 
allow the decoding algorithm to know 
what character is used as the escape 
character. 


Keyword Replacement 


A very common type of file stored in 
computer systems is a program file. Pro- 
grams offer great possibilities for text 
compression because of their stylized 
form and syntax. The techniques of 
deleting trailing blanks and using tabs to 
replace leading blanks can reduce 
storage requirements considerably. But 
an even larger gain can be made from 
keyword replacement. 

Most programming languages use a 
number of keywords or reserved words: 
in FORTRAN, such words as INTEGER, 
FORMAT, CALL and so on; in BASIC, 
such words as LET, READ, PRINT, REM 
and so on. These words are used 
throughout these programs and are 
prime candidates for text compression. 

Two techniques are commonly used. 
First, one can replace each keyword by 


$32 $3*$4 /**.@$30 /$5*\ $6* \ @$30 $7* $7*@$30 $5*” '$6*@$31 
**/$6 \ **'@$30 $3.$9 /$3* \ @$29 $4* \ $7 $6*@S29 $5*\. . $6*@$29 
\$4*' $6* \ **/@$31 " $6*/.@$36 \$4* \ @$43 \ @$43 *@843 *@343/ 
/$3* \ @$30 ..$9 ./ /$7* \ @$27 /$6*\$6 */$3*/'@$26 /**' "\**\ $3 
-*/$3'@$26 '$6 \** /'@$35 *$3\ @$35 \"/@$36 \*@$37 \@$37 \@ 


Figure 3: Further compression of the file shown in figure 2 done by 
replacing multiple identical characters with an escape sequence. The 
escape sequence in this case is the escape character ($) followed by the 
number of repetitions and the character to be repeated. This scheme is 
useful only when the repeat count is greater than 2. The count would 
normally fit into 1 byte, but is here shown in decimal. The character @ 
represents the carriage return and line feed. Only 287 characters are 
needed to represent the file in figure 2 using this representation. This 
reduces the file to 14.9% of its original size. 


137 


Figure 4: Comp 


10 
20 


READ A 

IF A=O THEN 110 
IF _A>0O THEN 80 
LET B 


A=0 $6 110 
iy $6 80 


R=SQR(B) 
AR” $$" 


1) 
R=SQR(A) 
AR 
10 


The keywords (1) GO TO, (2) IF, (3) LET, (4) PRINT, (5) READ, (6) THEN 


an escape sequence. The escape se- 
quence might consist of the escape 
code, followed by a number, indicating 
which keyword is being used. This has 
the advantage of allowing a large 
number of reserved words up to the 
number which can be held in1 byte, and 
can be particularly useful for assembly 
language symbolic Op codes. 

An alternative approach is to look 
through the existing character code for 
unused character codes. For example, if 
ASCII is being used, many of the control 
codes, some of the special characters, 
and perhaps the lowercase characters 
are not normally used. If 7-bit ASCII is 
being used with 8-bit bytes, then the ex- 
tra bit can be used to define 128 new 
unused codes, These unused codes are 
Paired up with the most frequently oc- 
curring reserved words. One code 
should be reserved for use as an escape 
or quote character, in case any of the 
codes assumed to be unused should 
happen to be used in an input file. 

For encoding, the input file is scanned 
for reserved words and each reserved 
word is replaced by the appropriate 
special code as illustrated by figure 4. If 
any of the special codes should show up 
in the input stream, they are replaced by 
the two-character sequence of the 
escape code followed by the input 
character. For decoding, all special 
codes are replaced by their equivalent 
keyword, except that any character 
preceded by an escape code is copied 
directly to the output, with no replace- 
ment. 

At this point, a problem may become 
apparent. Note that the keywords for 
any particular language are fixed and 
relatively small in number, but that the 
keywords vary from language to 


language. Hence, the appropriate cor- 
respondence between special codes and 
reserved words may vary greatly. In 
single language systems, such as those 
which offer only BASIC, this is not a 
problem, but more general systems need 
to consider this problem. 

Several solutions are available. First, 
one can simply use separate encoding 
and decoding routines for each 
language, leaving it to the programmer 
to use the appropriate one. Second, one 
can tag each compressed file with a byte 
which indicates if this is a BASIC com- 
pressed file, a FORTRAN compressed 
file, or a type X compressed file. Then 
the encoder must either be told how to 
encode the file or be able to guess or 
compute whether it is a FORTRAN, 
BASIC, or type X file and apply the ap- 
Propriate compression algorithm. The 
compressed file is tagged as it is encod- 
ed. The decoder looks at the tag and 
uses the appropriate decoding scheme. 

A third approach is more general, but 
potentially more expensive. The dif- 
ference between the encoding and 
decoding algorithms for the different 
types of files is simply the table of Pair- 
ings between keywords and character 
codes. Therefore, another approach is to 
prefix each compressed file with a dic- 
tionary of character codes and reserved 
word pairs. The dictionary explains the 
meanings of the special character codes 
by indicating the reserved words for 
which they stand. 


Substring Abbreviation 


The idea of appending an abbre- 
viation dictionary to the front of a com- 
pressed file opens the way to using the 
keyword replacement scheme for more 
general files. The idea is quite simple. 
Pick out those sequences of characters 
which occur most frequently in a file 
and replace them with a special 
character code. To allow decoding, we 
append a dictionary at the beginning of 
each file to show which special 
character codes correspond to each 
replaced character Strings. This ap- 
proach can yield very good text com- 
Pression, especially for Programs or 
natural language text, since keywords, 
variable names and some words, like 
the, and, and so on, are used very fre- 
quently. 

But there are some problems with this 
approach also. The major problem is 
selecting the character Strings to be ab- 
breviated. With Programs written in par- 
ticular languages, keywords occur fre- 


quently and so are a safe bet for 
substitution, but what constitutes ap- 
propriate character strings for general 
replacement? These can be determined 
ly by examining the file, since the ap- 
ropriate strings will vary from file to 
le. 
The objective, of course, is to realize 
he greatest savings in space. Here we 
e limited mainly by the number of 
jodes available for substitution. If we 
ise unused codes in the existing 
jaracter set, we are limited from ten to 
lifty abbreviation codes, typically. If we 
xtend the character set (eg: by using 
it codes with 7-bit ASCII), then we 
ay have as many as 128 codes 
ailable. Using an escape sequence 
jay provide up to 256, but at a cost of 
it least 2 characters per abbreviation. In 
all cases, the number of codes available 
ill always be limited to, say, m. Thus 
e need to pick those m strings for ab- 
breviation, which will result in the 
Breatest space savings. 
_ We do not always want to pick merely 

je most frequently occurring m strings. 
Consider the two strings, to and text 
compression. |f to occurs one hundred 
times and text compression only fifteen 
times, which should we replace? Replac- 
ing the two-character sequence to by a 
‘single abbreviation code saves only one 
character (assuming 1-byte abbreviation 
code) per occurrence, or a total of one 
hundred characters. Replacing the 
‘sixteen-character sequence text com- 
Pression saves fifteen characters per oc- 
‘Currence, or 225 characters total. Thus, 
in general we wish to replace that 
‘character sequence whose product of 
length and frequency is greatest. An ex- 
‘ample of substring replacement is 
' shown in figure 5. 
The encoding problem then becomes 
‘that of finding the m sequences whose 
 length-frequency product is greatest, 
_teplacing all occurrences of them with 
the m abbreviation codes, and append- 
ing the abbreviation dictionary at the 
front of the compressed file. The 
decoding problem reduces to merely 
 teading in the abbreviation dictionary 
' and replacing all abbreviation codes 
with the appropriate character se- 
quence. 

The only real difficulty is finding the 
_ mM sequences to be abbreviated. No real- 
ly good solution to this problem is 
_ known. The best solution | have seen 
' works as follows: first, make one pass 
_ through the file to compute the most 
frequently occurring pairs. There should 
be no more than 2500 of these, and prob- 


Dictionary: $A “the” 
$B ‘‘text compression’ 
$C “‘computer’’ 


Text: 


This paper is concerned with $A use of $B in $C systems, where $A amount of 


ae storage is limited. 


Figure 5: Text compression by substring replacement. Substrings are 
replaced by abbreviation codes: here we use escape sequences. A dic- 
tionary is placed at the beginning of the file to define the meanings of 


the abbreviations. 


ably many fewer. Compute the fre- 
quency of these pairs and keep only the 
m or 2m most frequent. Now consider 
that any sequence of length 3 both 
begins and ends with a subsequence of 
length 2, and that these 2 subsequences 
of length 2 must be at least as frequent 
as the length 3 sequence. That is, if there 
are twenty-three occurrences of abc, 
then there must be at least twenty-three 
ab and at least twenty-three bc. Thus we 
can make another pass through the file, 
counting the frequency of subsequences 
of length 3, but limiting ourselves to 
those sequences which begin and end 
with subsequences of length 2, which 
are also frequent. Next we can make 
another pass for length 4 (limiting the se- 
quences to those with frequent length 3 
subsequences), another pass for length 
5, and so on until we decide to stop. We 
can stop either when our last pass has 
produced no new sequences whose 
frequency-length product exceeds the 
previous set, or after a fixed number of 
Passes. 


Huffman Coding 


All of the schemes for text compres- 
sion discussed so far are similar in the 
sense that they confine themselves to 
working within the given character code 
and byte structure. Even more savings 
can result from recoding the character 
code representation itself. Almost all 
character code representations use a 
fixed code size: 6 bits for binary coded 
decimal(BCD), 7 bits fot ASCII and 8 bits 
for EBCDIC. This can be very wasteful of 
space. 

Consider the simple problem of en- 
coding the four characters: A, B, C, and 
D. If we use a fixed code size, then we 
could encode each character with 2 bits, 
as follows: 


A 00 
B 01 


139 


000000 v 


000001000 z 


000001001 J 


00000101 Q 


00000110 x 


00000111 K 


00001 P 


0001 H 


001 T 


01000 M 


01001 F 


01010 U 


010110 Ww 


010111 B 


Olle 


1000 § 


10010 c 


100110 ¥ 


100111 6 


1010 1 


1011 N 


1100 R 


21010 L 


11011 0 


1110 0 


111A 


+0100 


.0010, 
| 0026 0220 
-0016 | 20042, 
-0016 -0120 
.0460 
.0038 
| .0078 
.0040 .0921 
0240 
1861 
0461 
0940 
.0269 4251 
| 0544 
0275 
1105 
0280 
.0140 0561 2390 
0281. 
0141 
1285 
1.000 
0629 ~ 
-1300 
0322 
| 06714 
0173 
| 034 .2721 
0176 
0698 
| 1421 
0723 5749 


-1481 
0723 
0364 | 
| -0758 
0394 +3028 
0773 
| 1547 


0774 


Figure 6: Huffman code for the letters of the English language, based on 
the probabilities (ie: frequency of occurrence) of the letters in English. 
The code length is inversely proportional to the frequency of occur- 
rence of a given letter in much the same manner as Morse code. Code 


lengths vary from 


9 bits (for z and j) to 3 bits (for e and t). The average 


length is 4.1885 bits per letter, Fi ve bits would be necessary for a fixed 
length code, a space saving of 16%. 


140 


C10 
D1 


But suppose that the letter A occurs 
50% of the time in the text, B occurs 
25% and C and D split the remaining 
25% equally. Then the following 
variable length character code will pro- 
duce a shorter average text length: 


AO 

B 10 
Cc 110 
D111 


To compute the average text length, 
consider that out of n characters, n/2 
will be A which requires only 1 bit, n/4 
will be B for 2 bits each and the remain- 
ing n/4 will be C or D for 3 bits each. 
Thus the total number of bits to repre- 
sent n characters is: 


1(n/2) + 2 (n/4) + 3 (n/4) = 1.75n 


Comparing this with the 2n bits needed 
for the fixed length code, we see that we 
have saved 12.5% of the total file size, 
Variable length coding and decoding 
is somewhat more complex than fixed 
length coding, but not really difficult. It 
involves much more bit manipulation. 
To encode a string like ABAABCDAB, 
we simply concatenate the bit represen- 
tations of each character, packing 
across byte boundaries as necessary: 


D AB 
0 10 


A BAAB C 
0 10 00.10 110 111 


To decode, we must scan from left to 
right, looking at each bit. For the string 
01001100, we notice that the first bit is a 
0. Only A starts with 0, so our first 
character is an A. The next bit isa, soit 
could be a B, C or D, but looking at the 
next bit we see that the next character 
must be a B. We remove the 2 bits for 
the B and continue. The next bit is 0, so 
the next character is an A. The following 
bit is a 1, signifying either a B, Gor D. 
The next bit is a 1, signifying a C or D. 
Finally the next bit indicates a C The 
last character is an A. So our decoded 
text is ABACA. 


Computer-stored text files can benefit 
greatly from Huffman coding. Huffman 
coding can be used anytime that the 
probabilities of the character codes are 
not equal. In fact, the more unequal the 
probabilities, the better the compression 
with a Huffman coding. Looking at a 
table of frequencies of the letters in 
English, we can see that they are quite 
unequal, and hence can be compressed 
nicely with Huffman coding. 

To construct a Huffman code, a very 
simple algorithm is used. Refer to figure 
6. First, it is necessary to compute the 
probabilities of the characters to be en- 
coded. This requires one pass through 
some sample text, a part of a file, the 
whole file or several files, as desired, 
counting the occurrences of different 
characters. Then we need to sort the 
characters according to their frequency. 
Take the two least frequently occurring 
characters, and combine them into a 
Super character whose frequency is the 


l 


- sum of the two individual characters. 
| The code for each of the two characters 
will be the code for the super character 
» followed by a0 for one character and 1 
_ for the other. Now delete the two least 
” frequently used characters from the list 
and insert the new super character into 
the list at the appropriate place for its 
" frequency. Continue this process until 
all characters and super characters are 
combined into one super character. The 
“result is a Huffman code of minimal 
average code length. The Huffman code 
"may best be seen as a binary tree with 
» the terminal nodes (ie: the leaves) being 
’ the characters which are encoded. 

' Huffman coding can be quite suc- 
' cessful in text compression, in extreme 
cases reducing the size of a file more 
than half. The basic technique can be 
| improved upon ina number of ways. For 
example, pairs of characters, rather than 
single characters, can be used as the 
| basis of encoding. This requires a much 
' larger table of character frequencies, 
since now we need to compute the fre- 
quencies of character pairs, and larger 
tables of character pair and Huffman 


- code associations, but can result in 


greater savings. 

Another possibility is to use condi- 
tional Huffman coding. The objective 
here is to utilize the fact that the prob- 
ability (ie: frequency) of a character will 
vary depending upon what character 
proceeds it. For example, compare the 
probability of a U following a Q (nearly 
1) to the probability of a U following a U 
(nearly 0). So an optimal encoding 
should use a very short code for a U 
which follows Q and can use a very long 
code for a U which follows a U. The en- 
coding algorithm involves computing 
the frequency with which each charac- 
ter follows every other character. A sep- 
arate Huffman code is then computed 
for the characters which follow each 
character. The encoding scheme remem- 
bers the last character encoded and uses 
that to select the code to be used 
for the next character. The decoding 
algorithm must also remember the last 
character decoded in order to be able to 
select the correct decoding algorithm. 

Huffman codes are really quite 
simple, but they can be made more 
sophisticated to achieve increased text 
compression. However, even with sim- 
ple Huffman codes, some problems can 
arise. First, notice that Huffman en- 
coding and decoding both involve a 
great deal of bit manipulation, which 
can be very slow to program. Second, 
the best compression is achieved if a 


Huffman code can take advantage of 
the unequal frequencies of characters in 
a file, but these will differ from file to 
file. Thus a separate encoding may be 
best for each file. This can be done by 
appending the code at the front of a file, 
as with the dictionaries used for abbre- 
viations, but this increases the size of 
the file significantly for small files. 

Third, the variable length code nature 
of Huffman coding can make them ex- 
tremely vulnerable to transmission or 
storage errors. In a fixed length code, if 
one bit is changed, only that one 
character is affected, while with Huff- 
man codes, both that character and all 
succeeding characters may be decoded 
incorrectly because of a mistake in the 
assumed length of the incorrect char- 
acter. A similar problem would happen 
to a fixed length code if a bit were 
dropped or added. Thus, for safety, it is 
necessary to add error detection and 
correction redundancy back into the 
file, increasing its size. 

Still there are environments in which 
Huffman coding can be quite useful. 
Consider a word-processing system stor- 
ing files on a low-speed serial device 
such as a cassette. Since the system is 
special purpose, one can compute the 
expected frequencies of English 
characters and use one Huffman code 
for all files. Encoding and decoding 
would be done automatically by the 
tape driver routines. Alternatively the 
encoding and decoding could be built 
into the tape drive hardware as special 
purpose logic or into a small processor 
with a read-only memory encoding/de- 
coding table. This encoding/decoding 
approach would be totally transparent 
to the user. The only effect on the user 
would be the ability to store a larger, but 
variable number of “characters” on a 
fixed amount of tape. 


Conclusions 

The amount of storage space needed 
to store information can be greatly 
reduced by simple text compression 
techniques like the ones we have 
presented here. Each of the techniques 
presented can save some space in many 
files. And many of the techniques can be 
used one after another to achieve more 
and more compression. Text compres- 
sion can be a simple and effective 
method of increasing the amount of 
storage available in exchange for some 
processor cycles. @ 


REFERENCES 


. deMaine, P A D. The /n- 


tegral Family of Reversible 
Compressors. Computer 
Science Department, 
Pennsylvania State Uni- 
versity, 1971. 


. Dishon, Y. ‘Data Com- 


paction in Computer 
Systems.’’ Computer 
Design, volume 16, 
number 4, April 1977, 
pages 85 thru 90. 


. Huffman, D A. ‘Method 


for Construction of 
Minimum-Redundancy 
Codes.'’ Proceedings of 
the IRE, September 1952, 
pages 1098 thru 1101. 


. Knuth, D E. The Art of 


Computer Programming: 
Volume 1, Fundamental 
Algorithms, second edi- 
tion. Addison-Wesley, 
Reading MA, 1973. 


. Peterson, J J, Bitner, and 


J Howard, On the Selec- 
tion of Optimal Tab Set- 
tings. Department of 
Computer Sciences, Uni- 
versity of Texas, Decem- 
ber 1977. 


. Rubin, F. ‘Experiments in 


Text File Compression.” 
Communications of the 
ACM, volume 19, number 
11, November 1976, 
pages 617 thru 623. 


141 


Advanced Techniques 


g Techniques series. It covers topics 
‘interest to advanced programmers 
0 want to tackle some of the more 
fundamental subjects of computer 
ience. Some of the topics can only be 
briefly introduced here, and the reader 
encouraged to seek out the ap- 
‘propriate course of textbook in com- 
puter science to continue the study. 

Stacks and interrupts are only one 
step removed from the actual micro- 
computer hardware, but advanced ap- 
plications often require dealing with 
them. You might find these topics 
useful, for instance, when dealing with 
language processors (ie: compilers, inter- 
preters, assemblers, etc), real-time pro- 
cessing, or high-level operating systems. 
Another topic of interest to advanced 
rogrammers is program optimization. 
inder certain circumstances, the execu- 
ion speed of a routine can be critical to 
he successful completion of the as- 
igned task. This is especially true when 
‘dealing with real-time occurrences, but 
‘can be just as important when dealing 
with a more-or-less trivial programming 


About This 
Section 


problem. No one wants to wait 2 hours 
for a program to update a master file 
with only a few transaction records! 

One of those seemingly never-ending 
subjects is queuing theory. It has 
become an area of specialization within 
computer science in recent years, but 
don't let its depth keep you from apply- 
ing some of the simple techniques of us- 
ing queues in your own programs. Again, 
queues are useful in operating systems, 
communication networks, and real-time 
processing. 

The other subjects covered in this sec- 
tion (eg: BNF grammar, hand assembl- 
ing, Polish postfix notation, the Ex- 
clusive OR, etc) will help you make the 
most of your programming efforts. 
Remember, even if you are only a begin- 
ning programmer, don’t be afraid to 
tackle some of the more formidable 
topics in programming. The only way to 
sharpen your skills is to experiment with 
new techniques, even if you don’t have a 
ready use for them. And after all, isn’t 
that one of the main reasons for having a 
microcomputer — the opportunity to 
experiment? 


a : AES Hi » Me SAO aN! WO At eat: 
i + eheray’n trav 62, che att 1A) PAL 
f : samc it PARIS Seer = Meeeerge meAtIORD 
is ute enue Sample 7 av eyined o@aagte yy 
Be ! ies oF (eet lh Sovgetieg Gre} ay «3 me tis 
, 4a 1) pewe*h Geos Ws bAHT ie, oh 
4 Pi 9s Sel aphsiss rig’! alfant set ke cH 
j gine t welsen 6 | Tea 
ry ad Toes te } ME Th free oh Chey 
i FS weeny Hi = wit 
Lae Dena gTOWhT ig <ammeregD’ gh tach 
9] . Pah ot, Wet eS eS eM) our 
' atesieg bs 
he ¥ &5 bran Tie NH ve djl KS 
\ - ‘ CY dt 
ha j lap 7 ical ad i 
i : c ee Dee ath Ores net eM 
4 one ay’ Uh , ee mth eK AredeD SOS 1 
H's Ate dine giy Os cotunowe aghuds® GY Btw tae ean te abides 
‘ , vy , . - ay orrroda ly pub 
es ' a! tokyo" fe iat 
Me ' ; - mR] edt Jinan ‘ 
kt mi eT US rdt ays i x, 
N el bert é “iat 7 7 eu i te 
i 5 Ree M es Gel 4 
is, Jel = % ‘ Is 
yi! 7 ey ‘ » 
be] ' 
) " med? ti ‘ a 


The Algebra 


for the Boolean 
Exclusive OR: 
With an Application 


Many of us have been repeatedly 
drilled in the Boolean algebra opera- 
tions AND and OR. Less well-known is 
the full set of operations that can be per- 
formed using XOR (exclusive OR). In 
‘order to eliminate any possibility of con- 
fusion with other operations, | will use 
the notation XOR(list), where (list) 
denotes a list of variables separated by 
commas. Within one sentence, the 
repeated use of the list notation means 
the same list in each place is used. The 
truth table for the exclusive OR is: 


XOR(A,B) 
t°) 
1 
1 
te) 


-=-00 > 


B 
0 
1 
) 
1 
The first theorem is: 
XOR(A,true)= A 
or: 
XOR(A,1)= A 
Conversely: 
XOR(A, false) =A 
or: 
XOR(A,0)=A. 


Whenever the string of arguments to 
the exclusive OR contains repetitions of 


to Hamming Codes 


Webb Simmons 


the same variable, an even number of 
such repetitions can be removed; thus: 


XOR(A,A,B,C,C,C)=XOR(B,C). 
More generally: 
XOR(A,A, list) =XOR(list). 
We prove the above from facts that: 


XOR(list1 list2) = XOR(XOR(list1), 
XOR(list2)) 


or, by replacing list] by A,A and list2 by 
list: 


XOR(A,A, list) = XOR(XOR{(A,A), 
XOR(list)) 
XOR(false,XOR(list)) 
XORI(list) 


remembering that XOR(A,flase)=A. 
The fact that: 


XOR(list1,list2) = XOR(XOR(list1), 
XOR(list2)) 


must be proven by a truth table for 
various replacements for list1 and list2. 
In table 1, | show the proof for: 


XOR(A,B,C,D) = XOR(XOR(A, B), 
XOR(C,D)). 


My acceptance of “XOR(list1,list2) = 
XOR(XOR(list1),XOR(list2))” before it 
was proven in table 1 was the use of a 


lemma. After it is proven, it is no longer 
a lemma, but a theorem. 

Two included cases of our new 
theorem are: 


XOR(A, list)=XOR(A,XOR(list)) 
and 
XORI(A list) =XOR(A list). 


The term list can be replaced by 
XOR(list) and will finally evaluate to 
either true or false (1 or 0). Consider the 
cases: 


XOR(A,0) = XOR(A ,0) 
and 


XOR(A,1) = XOR(A 1). 


Each of the above can be shown to be 
true for A=1 and for A=0. 

We will finish this discussion with 
three sets of “if,then” clauses. 


ifXOR(A,B,C)= false, 
then A=XOR(B,C). 


The proof of this lemma is shown in 
table 2. Obviously, cyclic rotation gives 
us, for the same condition: 


B=XOR{(A,C) 

C=XOR(A,B). 
A_B C D/IE|F GI/H 
0 0 0 O}]O0]/0 oO|o 
0 8. O Tahoe Res 
0. OF 8 10 ba ie polis 
0 0 1 1/0/0 oO|]fo 
Oot OF (Ont ont 
O18 20 AO ae eo 
LR RR Ua! a ha Wd ay 
Ooh | Tele aos 
A OO) Of Ata Os 
120.0) 31000 AL echo 
1 O° 1 OOPOSRaAL St fo 
POT Auta, Seas 
A 9 O80: 48801. OAto 
oD OSes gis 
pe ee a? le Sa Bs 
ECt 2 APO; 90-0 


Table 1: Theorem: XOR(A,B,C,D)= XOR(XOR(A,B),XOR(C, D)). 


Proof: let E = XOR(A,B,C,D) 
let F = XOR(A,B) 
let GC = XOR(C,D) 
let H = XOR(F,C) 
= XOR(XOR(A,B), XOR(C,D)) 


Allowing all possible combinations for A, B, C, and D, and solving for E, 
F, G, and H, we find that E always equals H. The theorem is proven to be 


true. 


148 


You may also prove that: 


if XOR(A,B,C)=true, 
then A = XOR(B,C). 


For our second, “‘if,then’” clause we 
have: 


if A=XOR(B list), 
then B=X{(A,list). 


The proof of this lemma is left as an ex- 
ercise for the reader, as is our final lem- 
ma: 


if XOR(A, list) =XOR(B, list), 
then A=B. 


Hamming Codes 


Exclusive OR algebra is used in con- 
junction with an error correcting code 
called the Hamming code which, in 4 
bits, is able to encode the values 0 and1 
such that any error of 1 bit can be cor- 
rected and any error of 2 bits can be 
detected but not corrected. The values 
of 0 or 1 are encoded by evaluating the 
exclusive OR of three arguments, where 
the arguments are: 


@ the value itself 

@ the value rotated left by 1 bit posi- 
tion 

@ the value rotated left by 3 bit posi- 
tions ; 


This produces 0000 for zero and 1101 for 
one. These are decoded by performing 
the exclusive OR of three other func- 
tions: 


@ the encoded value 

@ the encoded value rotated right 1 
bit position 

@ the encoded value rotated right 3 
bit positions 


The desired value, zero or one, will be in 
the upper (ie: most significant) bit posi- 
tion if there has been no error, and the 3 
low bits will all be zero. 

It is an interesting fact that any one or 
two errors will cause the low 3 bits to be 
other than all zeroes. In fact, the par- 
ticular value will tell you the position of 
any 1 bit error and will blow the whistle 
for two errors. Hamming codes cannot 
be covered here, but | will show that an 
error in the first causes the low 3 bits to 
be 101. The entire value will be either 
0101 or 1101. Bits received without error 
will be the logical variables A,B,C, and 
D. When received correctly, we have: 


ABCD (the encoded value) 
DABC (right rotated by 1 bit) 
BCDA (right rotated by 3 bits) 


x000 (the x is either zero or one). (a) 


=2=420000/p 


o--00--0o|m 


3=00-0-0lg 


=300-=00/0 
-=0-0-0-01|10 


which gives us the four Boolean equa- 
tions: 


XOR(A,D,B)= don’t know 
XOR(B,A,C)=0 

~ XOR(C,B,D)=0 
XOR(D,B,A)=0. 


(b) 


=-00|p 
so-0|n 
o--0/0 
cocoo|l9 
-=-00|m 


if the first received bit is in error, we get 
A rather than A to produce: Table 2: Method used to prove that, if XOR(A,B,C)=0, then 


A=XOR(BC). If D=XOR(A,B,C)=0, then we delete all rows in table 2a 
for which D=1. This produces table 2b. We now let E=XOR(B,C). It is 
evident that column E equals column A. Therefore: if XOR(A,B,C)=0, 
then A=XOR(B,C). Similar exercises will show that, under the same in- 
itial assumption, B=XOR(A,C) and C=XOR(A,B). 


XOR(B,A,C) =0, then XOR(BA,C)=1. Actually, the first question mark can 
XOR(C,B,D)=0, then there is no be shown to be x as follows: 

ange. ise 

XOR(D,C,A)=0, then X(D,C,A)=1. if XOR(A,D,B)=x__ 

is gives us: the XOR(A,D,B)=x. 


As_an exercise, you may wish to prove 
that B alone causes a decoded result of ; 
_x110, that C alone causes x111, and that 
pe D causes x011. Then work out the results 
2101 =XOR(three arguments) of any two errors. As expected, three or 
four errors should lead you to the wrong 


the predicted result. value. 


149 


 Radhakrishan, MV Bhat 


The stack or the /ast-in first-out (LIFO) 
lata structure has become an essential 
ool in computer systems. There are two 
ajor operations associated with this 
a structure: 


which places a new data item 
on top of the existing ones in 
the stack. 

which removes the topmost 
element of the stack for suc- 
ceeding operations. 


"A spring-loaded plate holder in a 
‘eteria is a good example of a stack, 
ice addition and removal of items oc- 
ur at the same end in a last in, first out 
quence. (See figure 1.) When the 
pacity of a stack is “n’” items, then 
i+1 consecutive PUSH operations will 
ause the stack to overflow. Similarly, 
jopping an empty stack creates an 
derflow. Even though stack underflow 
jay not occur intentionally, program- 
ers should account for this condition. 
tack overflow is more probable when 
1 stack capacity is not large enough to 
iccommodate all the occurring condi- 
ions simultaneously. 
Stack size is one of the major design 
rameters in processor architecture. 
instance, the earlier Intel 8008 pro- 
sessor had a built-in seven-level 
subroutine control stack which was later 
increased to a more general stack 
Pointer which could range throughout 
emory in 8080. 
In the software realization of stacks, a 
»P ogrammable memory location is used 
along with an address pointer, called the 
stack pointer (SP). The SP points to the 
"Memory location that holds the top ele- 
_ ment of the stack; the pointer is updated 
: incremented or decremented) after 


Stacks in Microprocessors 


SEE 


EMPTY PUSH A PUSH X 
PUSH K 


aE 


Figure 1: A sample three-word stack. A PUSH command causes one 
piece of data to be PUSHed onto the stack; the resident data is pushed 
downward to make room. Similarly, a POP command removes the top- 


most piece of data and shifts the rest of the stack upward. 


every push or pop operation as shown in 
figure 2. In this case, the programmer 
must set aside a portion of the main 
memory to accommodate the stack. 
Consequently, the stack capacity is 
determined by the free space in the 
main memory and is more flexible. In 
figure 2, the occupied portion of the 
stack grows from low to high memory 
addresses. Hence, the PUSH operation 
increments the stack pointer and the 
POP operation decrements it. It is not 
difficult to introduce the stack overflow 
and underflow conditions in the above 
simulation. 

In another realization of stacks, a set 
of n registers constitutes a stack. Every 
POP operation takes the data item from 
the topmost register; the data in each 


151 


stack location is then shifted upward. 
The PUSH operation shifts the stack 
contents down one place and adds the 
new data item. In this approach, reading 
from and writing to the data structure 
occur only with the topmost register. In- 
ter register transfers can be achieved in 
parallel during the same clock period. 
The stack facility available with IMP-8C 
microprocessor, an example of this type, 
has a capacity of sixteen words. This 
method of realization is known as the 
fixed top, as in figure 1, in contrast to the 
moving top approach explained earlier, 
as in figure 2. The flexibility associated 
with the latter can be combined with the 
speed advantage of the former as is 
done with PACE microprocessors. (See 
table 1.) 

Most modern processors provide one 
or more registers to hold stack pointers. 
For example, there is one stack pointer 
register in the Intel 8080 and there can 
be as many as sixteen Stack pointers in 
the RCA 1802 processor. The pop and 
push instructions update the SP registers 
automatically. The architecture and the 
stack-oriented instructions differ widely 
among the various processors, and table 
1 gives details of some of the common 
ones. 


PoP: Operond +-M(SP]J; 
SP +sp-1 


PUSH: SP <—sP +1; 
MCSP] < Operand 


M: MEMORY 
AFTER POP 


Figure 2: A software simulation of the pushdown stack. Operation of 
the stack is identical to the hardware stack in figure 1, except that there 
is no dedicated hardware involved. Instead, a program creates a stack 
pointer in memory which points to the current location at the top of the 


stack. 


152 


Typical Applications of Stacks 


Suppose a routine A calls another 
routine B at some point a in A. Similarly, 
let B call C at point b. The addresses 
a+1 and b+1 are the return addresses 
where execution control will return from 
the called routine. It is evident from 
figure 3 that the return addresses are 
used in the reverse order of their se- 
quence of occurrence. The labels c1, c2, 
c3 in figure 3 stand for the first, second 
and third calling of routines, and r1, r2, 
3 stand for the first, second and third 
returns from the called routines. This 
last-in first-out (LIFO) nature of the use 
of return addresses in multilevel calling 
is commonly implemented with stacks. 
Simple extensions have been devised to 
pass the parameters along with these 
return addresses using the stack struc- 
ture. 

The calls shown in figure 3 could also 
be considered as calls to service 
routines due to asynchronous interrupt 
signals. In the latter case, the return ad- 
dresses are not predetermined address 
points, but are the contents of the pro- 
gram counter. However, the last-in first- 
out nature of the return addresses re- 
mains valid. The call due to an interrupt 
creates a new process, and hence the 
status of the current process (ie: process 
status word, flags, etc) has to be addi- 
tionally saved. Some processors, like the 
IMP-8C, have instructions to push and 
pop status flags onto stacks. In other 
processors, this is done automatically 
when an interrupt occurs. Stacks in 
microprocessors, starting from the early 
Intel designs, have traditionally been 
used primarily for subroutine control 
and interrupt handling. 

Another use of stacks, though one not 
much used in the hardware of pro- 
cessors, is in the compiling arithmetic 
expressions. Consider the following 
arithmetic expression: 


A+BXC—D/E 


In this form, the operator is between the 
two operands. This is known as infix 
notation. The form in which the operator 
follows the operands is called postfix or 
reverse Polish after the Polish logician J 
Luckasiewicz, who investigated the 
properties of this notation. The postfix 
equivalent of the above expression, 
which does not require any parentheses, 
is as follows: 


AB+C XDE/— 


Algorithms exist which use the stacks to 


convert arithmetic expressions from in- 
fix to postfix notation. (See reference 2) 
Figure 4 shows a sample code for the 
above postifx expression; it is meant for 
‘a computer with stacks, and is used to 
evaluate arithmetic expressions. Opera- 
"tions such as ADD and SUB take the top 
| wo elements of the stack, perform the 
"operation, and then push the result back 
_ onto the stack. Such a system is calleda 


Stack Machines 


Among the architectures with two 
stacks, two broad categories are evi- 
dent. The first kind of machine provides 
stack features along with conventional 
architecture. This stack feature might be 
implemented through a hardware- 
realized stack, a stack-pointer register 
with a set of associated hardware in- 


a 


His tack computer. Using the postfix nota- structions, or a complete software 
tion, it is not hard to generate code for simulation using a memory location as 
‘machines with single accumulators or the stack and its pointer. Some combina- 


_ for machines with multiple registers. tions of these three approaches are also 


Hardware Stack or 


Processor Stack Pointer Stack-Oriented Instructions 


16 bit stack pointer Push register pair into stack 
Pop register pair from stack 
Push/Pop processor status word 
Exchange stack top with 
register pair (H,L) 
Load SP from register pair (H,L) 


16 bit stack pointer All the instructions of Intel 8080 
Push/Pop the (two) index 
registers 


16 bit stack pointer Push/Pop the (A or B) accumulator 
Load SP from memory 
Store SP into memory 
Transfer index register contents 
to SP 
Transfer SP into index register 
Increment/Decrement SP 


. RCA 1802 16 bit stack pointer Increment/Decrement the selected Any of the 16 
register (SP) registers can 
Push/Pop the working (D) register be used as a sP 
Load the D register into left or 
right half of SP 


Hardware stack Push/Pop program counter Stack overflow 

8 16 bit words Push/Pop the specified register Underflow 
Exchange the contents of the Interrupts are 
register with SP provided 
Push/Pop the flag register 


Hardware stack Push/Pop the selected accumulator No overflow 
16 8 bit words into stack Underflow 
Exchange the stack top with the Interrupts 
selected accumulator 
Push/Pop the status flags into 
the stack 


Table 1: Stack features of some common microprocessors. The stack is 
a storage place in a computer designed to hold pieces of data in serial 

A order. PUSHing an element onto the stack causes the existing elements 
a in the stack to be moved downward, in much the same manner as a 
spring loaded plate holder found in restaurants. POPing an element 

, from the stack removes the most recent addition to the stack for use. 
Be Because of these two features, the stack operation is often referred to as 


last-in first-out (LIFO). 
153 


ROUTINE A 


ROUTINE C 


ROUTINE C 


Figure 3: Diagrammatic representation of multilevel, or nested, 
subroutines. The return address of each subroutine call must be 
remembered so that the program can return to the right place after the 
subroutine is completed. The last-in first-out nature of nested 
subroutines is such that the stack is a logical way to keep track of the 
return addresses, 


Contents of Stack 


Op Code (read left to right) 
PUSH A A 

PUSH B BA 

ADD (A+B) 

PUSH C C(A +B) 

MPY (A+B)*C 

PUSH D D(A +B)*C 
PUSH E E,D,(A+B)*C 
DIV (D/E),A+B)*C 
SUB (A +B)*C —(D/E) 


Figure 4: Op code designed for use with Polish postfix notation on stack 
oriented computers. Polish notation is a method for rewriting expres- 
sions unambiguously by systematically segregating operators and 
operands. For instance, the expression used in this example appears as 
(A+B)XC—D/E in normal, or infix notation; the Polish postfix 
equivalent is AB+C x DE/. The latter can be directly used by a stack 
oriented computer, which automatically performs stack operations. 
(For example, a stack ADD instruction takes the top two elements of the 
stack, adds them together, and pushes them back onto the stack. The 
MULT, DIV and SUB operators work in the same manner.) The algorithm 
tor evaluating the expression then reduces to examining each element 
in the Polish notation string from left to right, pushing it onto the stack 
if it is an operand and performing the operation if it is an operator. 


present in some recent processor archi- 
tectures. Most processors have some 
sort of stack facility and instructions to 
manipulate data with stacks or stack 
pointers. 

The second kind of machine with 
stack facility can be called a stack 
machine. Its architecture is completely 
centered on stacks. The Burroughs 
B5500 and B6700, HP3000 and: 1CL2900 
are examples of this category. In these 
machines, the three basic functions of 
process management, memory manage- 


154 


¥ 


ment, and data management of jobs, are 
all stack oriented. Most of these ar- 
chitectures support block-structured 
languages similar to ALGOL or PL/I. A 
program written in a block-structured 
language can be visualized as a tree 
structure; execution of the program 
traces some paths in this tree structure. 
The relationship between tree structures 
and stack data structures is well known. 
(See reference 4.) An example is shown 
in figure 5 along with various points of 
stacks holding the program variables. 
Because of the limited-access points 
with stacks, certain extensions are re- 
quired in stack machines to implement 
the array data structures. These exten- 
sions are of a different kind, such as the 
use of index registers for addressing. 
Similarly, to facilitate process and 
memory management, special software 
tools are used. 

Computer systems and architectures 
can be appraised from three points of 
view: the languages available to users 
such as application and system program- 
mers, the operating system, and the 
hardware. These three areas are highly 
interrelated, and it is difficult to 
separate their capabilities. A few stack- 
machine architectures are commercially 
available with facilities for 
multiprogramming and timesharing. The 
architecture of the Burroughs systems is 
such that the system software can be ef- 
fectively written in a high-level 
language. Stack machines have good 
and bad points. Their advantages are 
noticeable in block structured program- 
ming, which is becoming popular. As 
Doran points out, stack machines have 
proven to be successful. (See reference 
1.) The increasing cost of software and 
the flexibility available through 
microprogramming indicates a trend 
towards stack machines or, at least, 
toward a greater use of stack features in 
computer architectures. 


Conclusions 


Developments in software and pro- 
gramming techniques during the past 
decade have proven the advantages of 
stack data structures. Microprocessors 
of recent origin provide adequate 
facilities to support this data structure. 
The provision of stack pointers is a com- 
promise between the expensive and in- 
flexible hardware stacks at one end and 
the inexpensive and flexible software 
simulation at the other end. Most 
microprocessors have stack pointers and 
a set of associated machine instructions. 


ck machines have certain advan- 

in higher-level, block-structured 
ramming and the implementation 
operating systems. At present, pro- 
ming with microprocessors is done 
ly in machine- or assembly-lan- 
se level. Large in-house software 
ems for microprocessors are not yet 
ality. As a result, stack-machine ar- 
ectures are still in the realm of large 


f 


REFERENCES 


joran, R W. ‘Architecture of Stack 
hines’’ in High Level Language Computer 
rchitecture. Edited by Y Chu, Academic 
s 1975. 

, D. Compiler Construction for Digital 
outers. John Wiley & Sons, NY 1971. 
th, D E. The Art of Programming, vol 1, 
damental Algorithms. Addison Wesley, 
ading MA 1968. 

Keeman, W. ‘Stack Computers.’ /n- 
duction to Computer Architecture, edited 
HS Stone SRA Inc 1975. 

anick, E |. Computer System Organization: 
e B5700/B6700 Series. Academic Press 


cknowledgement 

‘e gratefully acknowledge the help of 
katesh, research assistant, Computer 
nce Department of Concordia 
ersity, in the preparation of this 
script. 


(yl, y2,y 3) 
(kd, k2) 


X: Begin 
Integer x1, x2; 


Y: Begin 
Integer y1,y2,y3; 


Zz: Begin 
Integer z1; 


K: Begin 
Integer k1,k2; 


N: Begin 
Integer n1; 


End; (N) 


End; (X) 


Figure 5: A block-structured program. Programs written in block- 
structured languages can be visualized as tree structures (figure 5a). 
‘ALGOL and PL/I are examples of this type of language. The tree in this 
illustration shows how the program is structured. Figure 5b shows how 
the stacks in a stack-oriented machine would look at various points of 
the program. Figure 5c shows the block layout of the program. 


155 


tack It Up 


riton H Allen 


“Most microprocessors currently 
vailable employ a stack of some sort. 
jis stack is either a scratch memory in 
e processor itself or an addressable, 
programmable memory characterized 
by retrieval of information in the reverse 
order of storage using a pointer. In the 
mon parlance, a stack is a last-in 
t-out (LIFO) mechanism. It is a very 
ul feature for preserving the proper 
ler of subroutine call and return 
nts with minimal hassle. Experienced 
rammers using 8080-type machines, 
ickly discover its other uses. For ex- 
imple, a direct register store instruction 
bytes long on the 8080, whereas a 
ister stack instruction is only 1 byte. 
a result, saving registers used by 
ubroutines and restoring them later is 
saper if the stack is used in 
reference to some directly addressed 
mory area. More importantly, 
haps, the availability of such a 
shanism greatly simplifies the writing 
f reentrant routines (ie: ones which do 
modify themselves in the process of 
ution). Note, however, that all the 
hanisms provided in microproces- 
Ors to date for stack operations are ex- 
licitly fixed mode and singular. There is 
nly one stack, and it operates on en- 
of the same width, in number of 
as the accumulator(s). Moreover, 
entities have no attribute other 
their fixed width, in bits. 

n contrast, several large-scale com- 
ers, such as the Burroughs 5500 pro- 
r with which | am familiar, employ 
oe generalized stack mechanism in 
lich: 


e storage area for the stack(s) is in- 
pendent of the central processor's 
emory (ie: not directly addressable). 
entities being stored and re- 
eved have attributes of type (ie: in- 
r, logical, real, string, array) and 
length (ie: array size). 

tiple stacks may be processed 
ultaneously and independently. 


To achieve the latter, the stack con- 
troller requires a stack-control block in 
central-processor addressable memory 
to be uniquely associated with each ac- 
tive stack. Otherwise, such stack con- 
trollers bear approximately the same 
relation to the central processor and its 
addressable memory as a high-speed 
data channel. Since the data transfers 
are generally affected through cycle- 
stealing direct-memory addressing, and 
an unmaskable interrupt to the central 


Listing 1: PARSE, a translation procedure written in an informal ALGOL 


STRING PROCEDURE PARSE(Exp): 

STRING Exp; 

BEGIN 

EXTERNAL INTEGER PROCEDURE  Intoken ; 
LOGICAL Endinput, Errflag ; 
INTEGER Position, ercihy rs 


INTEGER ARRAYS= ( 1 -1 -2 2 -9, 
3 3 4-4-9 
5 -§ -6 6 -9, 
-7 7 8 -8 =9, 
-9 -9 -9 -9 -9); 
STACK Q; 


Errflag :=Endinput :=false; 
PARSE :=null; Position :=0 
1 :=Intoken(Exp, Position, Endinput); 
J :=Intoken(Exp, Position, Endinput); 
COMMENT I is last token, J is current ; 
IF Endinput THEN Errflat := true 
ELSE WHILE NOT Endinput DO BEGIN 
T :=S(LJ); IF T<0 THEN Errflag := true 
ELSE CASE T OF BEGIN 
COMMENT valid sequence of tokens ; 
CASE]: BEGIN 
Q:=PARSE; PARSE := null; 
END; 
CASE2: null; 
CASE3: PARSE :=PARSE . Q; 
CASE4: PARSE :=PARSE . Exp(Position) . '$'; 
CASES: BEGIN 
Q:=PARSE . ‘$’; PARSE := null; 

END; 
CASE6: PARSE :=PARSE . Exp(Position); 
CASE7: PARSE :=PARSE . Q; 


CASES8: PARSE :=PARSE . Exp(Position) . Exp(Position — 1); 


I:sJ; 
J : =Intoken(Exp, Position, Endinput); 
END; 
WHILE NOT Q = empty DO PARSE := PARSE .Q; 
IF Errflag THEN PARSE : = null; 
END. 


157 


processor occurs only when an error 
condition, in this case stack overflow or 
underflow is detected. 

| don’t seriously propose such a stack 
controller for the representative 
homebrew computer system. | do pro- 
pose, however, to show by example that 
incremental programming development 
in that direction can provide 
correspondingly simpler solutions to a 
large class of computing problems. 


A Problem 


One of the curious properties of 
calculators using Polish notation tech- 
niques is that any expression using the 
operators provided on the keyboard can 
be evaluated in an absolute minimum of 
keystrokes. Moreover, the required 
number of temporary storage areas (ie: 
depth of stack) is at most the number of 
Operands for the most complex 
operator. In an exactly analogous way, a 
stack of depth two or a second ac- 
cumulator is sufficient in digital com- 


Input string: 1 + (((A +B)/C) -(D*(E-F)/G))/H 


Position 


1 
4 
3 
4 
2 
3 
4 
2 
3 


ore 


ONNKHAWOHD AOBH 


KONAWO HAOAL 


os 


PONOHAW HAWA 


Gon GLYOLh UMHOHW 


OWN A NOO@ObDO 


PARSE 


null 
+1 

null 
null 


null 


-E 
-EF 
-EFD$ 


—EF*D$/$ 

~EF*D$/$G 

- EF*D$/$G + AB/$C -$$ 
—EF*D$/$G + AB/$C-$$+1$ 

- EF*D$/$G + AB/$C - $$ + 19/$ 
—EF*D$/$G + AB/$C - $$ + 1$/$H 


Figure 1: Sample parsing process resulting from use of program PARSE. 


158 


puters for evaluating any size expression 
using operators corresponding to native 
instructions, provided that the terms are 
calculated in the correct order. The 
price to pay for this admittedly pleasing 
property is learning to think things from 
the inside out. We mentally seek the in- 
terior of the expression, innermost term 
in parentheses, and works outward in 
calculation left to right. The pity is that 
it does not come easily to lots of folks, 
since most people use the algebraic 
method of solving expressions which is 
the way they were taught in school. [If a 
larger stack is used, the expression can be 
evaluated from the left to right with the 
intermediate answers pushed onto the 
stack.... RGAC] 


A Solution 


The main problem with Polish nota- 
tion is really one of representation. We 
favor entering an expression in the same 
way it appears in, for example, a 
statistics handbook. If that could be 
done, if a way could be found to re- 
arrange expressions from algebraic form 
to Polish notation, a mathematical 
calculator or computer could be con- 
structed having the computational effi- 
ciency of Polish notation without 
sacrificing ease of use. In fact, this pro- 
cess of rearrangement has been intrinsic 
to most higher-level programming- 
language compilers and interpreters for 
many years. The manner of rearrange- 
ment is most easily explained in terms of 
its program that uses a stack only slight- 
ly more general than the native stack in 
microprocessors. 


Explanation 


Listing 1 is a procedure for parsing, 
computer jargon for rearranging 
generalized binary operator expres- 
sions. In somewhat less prosaic 
language: PARSE is a program which 
takes an algebraic form expression and 
rearranges it to produce a sub-Polish 
notation form expression containing 
references, where needed, to the run- 
time stack. Its output presumes that the 
result of each calculation is immediate- 
ly placed on the stack. 

Note that PARSE does not count 
parentheses. In fact, it does not even use 
them directly. Instead, it uses an exter- 
nal procedure called INTOKEN to scan 
the input expression, EXP, and produce 
encoded tokens depending on the cur- 
rent input: 


‘or a left parenthesis. 

‘or a right parenthesis. 
for an operator. 

‘or a constant or symbol. 
none of these. 


nother peculiar property of PARSE, 

ming you have not figured out how 
orks yet, is that only one complete 

KEN scan of the input expression is 
uired because the use of a stack, Q, 
; retaining the symbols for in- 
ediate expressions. INTOKEN 
nition of parentheses (output 
1 and 2) effectively controls stack- 
nd popping up symbols for in- 
iate expressions in the required 


fhe operation of PARSE depends criti- 
lly on the array S. In use, its row 
bscript is presumed the value of the 
ANTOKEN output, its column sub- 
t the value of the current INTOKEN 
t. Specifically, if the last input 
was a left parenthesis and the 
nt input token was a symbol or con- 
then INTOKEN’s last and current 
ts would be 1 and 4; the matching 
ent in S (row 1 column 4) has value 
that the statement CASE2 would 
erformed. Subsequently, J replaces 
d INTOKEN is again invoked 
aluate J anew; a new element of 
fetched using the new values of | 
J as subscripts; and the element of 
CASE statement list matching. the 
value taken from S is performed. 
process is repeated until INTOKEN 
Endinput true, indicating the end of 
input string Exp has been detected. 
the last two tokens might be right 
heses, and PARSE does not in fact 
s the last token since tokens are 
only in pairs, the stack Q is always 
d before PARSE finishes. 
‘RSE is presented in informal 
{OL only in the hope the process per 
uitably rearranging algebraic form 
sions can be made more easily 
rstood than via an equivalent 8080 
bly-language program which 
prove to be a transliteration 
are for the novice LSI-11 or PPS-8 
‘ammer. Contrarily, the step by step 
of PARSE and the associated con- 


Ssion. The function of INTOKEN, rec- 
ing and encoding the elements of 
xpression, is sufficiently straightfor- 
rd that an explicit statement of it is 

lly necessary, but listing 2 is includ- 


Listing 2: INTOKEN encodes the current character in the input expres- 
sion, Exp. As before, an informal ALGOL type notation is used. 


INTEGER PROCEDURE INTOKEN (Exp, Position, Endinput): 
LOGICAL Endinput; 
INTEGER Position ; 
STRING Exp; 
BEGIN INTOKEN :=0; 
If Position = SIZE(Exp) THEN Endinput := true 
ELSE BEGIN 
Position := Position + 1; 
WHILE Exp(Position = ‘ ‘DO Position := Position + 1; 
IF Exp(Position) = \(/THEN INTOKEN := 1 
ELSE IF Exp(Position) = ‘)’THEN INTOKEN :=2 
ELSE IF Exp(Position) = ANY(‘+','-',"’,'/’) THEN INTOKEN := 3 
ELSE BEGIN 
INTOKEN := 5; 
COMMENT Presume error first, determine otherwise later; 
IF NOT (0>Exp(Position) OR ‘9’ <Exp(Position)) 
THEN BEGIN 
INTOKEN := 4; 
WHILE NOT (0>Exp(Position) OR ‘9’< Exp(Position)) 
DO Position : = Position +1; Position := Position -1; 
END ELSE 
IF NOT (‘A’>Exp(Position) OR 'Z’<Exp(Position)) 
THEN BEGIN 
INTOKEN : = 4; 
WHILE NOT(‘A’> Exp(Position) OR ‘Z’<Exp(Position)) 
DO Position : = Position +1; Position := Position — 1; 
END; 
END; 
END; 
END. 


Listing 3: Single stack-control routines written for the 8080 processor. 
STACK places a string of characters on a LIFO list, followed by the 
‘length of the string. POPSD removes the length of the last entered 
“string, if any, from the list. POPUP removes the last entered string, if 
any, from the list. (Note: These routines are not debugged; in fact, the 
symbol STACK is multiply defined, so that it will not assemble correct- 
ly. They are included here only to suggest an appropriate technique.) 


STACK: PUSH PSW COMMENT The following presumes 


PUSH B ; external procedures ABUF and 

PUSH D ; RBUF whose functions are 

PUSH H } respectively 

XCHG } acquire a buffer of byte size 

LHLD STACK i specified by A, returning 

PUSH H ; address in H,L or zero if 

POP B : none available 

ADI 3 ; release a buffer addressed by 

CALL ABUF ; HL to the buffer pool : 

MOV A,H ; 

ORA L ; STACK: SAVE(H,L); 

IZ STKOF ; ABUF(A+3); IF 0 

SHLD STACK ; THEN SET(Carry) 

MOV A,C ; ELSE BEGIN 

STAX H 3; COMMENT Stack entry contents; 

INX H ; +0 address of previous entry 

MOV A,B ; +2 _ size of current item 

STAX H 3; +3 current item 

INX H ; 

POP PSW 3 caller provides size in A, 

MOV B,A 3 item data address in H,L; 

STAX H ; RESET(carry); 

ORA A 3 MEMORY(H,L) :=Stack; 

iz STKCX ; ssizein A, 

MOV BA ; item data address in H,L; 

STAX H 3; RESET(carry); 

ORA A 3; MEMORY(H,L) : =Stack; 

4 STKCX ; Stack :=(H,L); 

INX H > (AL):=G@ Ll) +2 
STKCY: LDAX D 3 memory(H,L) := A; 


159 


4 


Listing 3, continued: 


STAX H 
INX H 
INX D 
DCR B 
JNZ STKCY 

STKCX: POP H 
POP D 
POP B 

STC 

CMC 

RET 

STKOG: POP H 

POPUF: POP D 
POP B 
POP PSW 
STC 
RET 


POPSD: PUSH H 


(H,L):=(H,L) +1; 
RESTORE(D,E);SAVE(D,E); 
WHILE NOT A=0 DO 
BEGIN 
MEMORY (H,L) := MEMORY (D,E); 
(H,L): =(H,L) +1; 
(D,E): =(D,E) +1; 
A:A-1; 


RESTORE(H,L); 


POPSD: IF Stack = 0 


STC ; THEN SET(Carry) 
LHLD STACK ; ELSE BEGIN 
MOV AH 3 COMMENT Give caller size 
ORA L ; of next entry to pop, for 
IZ POPZD ; buffering as needed ; 
INX H ;  RESET(Carry); 
INX H ; SAVE(H,L); 
CMC 3 (HL): = Stack +2; 
LDAX H 3 A:=MEMORY(H,L); 
JMP POPXD ; _ RESTORE(H,L); 
POPZD: SUB A ; END; 
POPXD: POP H d 
RET ; 
; The following must be in R/W 
; memory, since Stack is the 
B list-origin address, and LHLI 
; is externally modified to 
; effect an indirect LHLD. 
LHLI; LHLD 0 ; 
RET ; 
STACK: 0 ; 
POPUP: PUSH PSW ; POOPUP: IF Stack =0 
PUSH B ; THEN SET(Carry) 
PUSH D ; ELSE BEGIN 
PUSH H ; COMMENT Target area is 
LHLD STACK + specified by caller H,L; 
XCHG ;  RESET(Carry); 
POP H ; SAVE(D,E,H,L); 
MOV A,D 3 (D,E): =Stack; 
ORA E ; B:=MEMORY(D,E +2); 
Jz POPUF + SAVE(D,E,H,L); 
PUSH H + (D,E)_ :=(D,E)+3; 
PUSH D ; WHILE NOT B=0 DO 
INX D ; BEGIN 
INX D ; COMMENT Zero-length entries 
LDAX D : are removed but not copies ; 
ORA A ‘ MEMORY(H,L) :=MEMORY(D,E); 
IZ POPCX ; (D,E): =(D,E) +1; 
INX D 3 (H,L): =(H,L) +1; 
MOV BA ; B:=B-1; 
POPCY: LDAX D ; END; 
STAX H ; RESTORE(D,E,H,L); 
INX HZ : Stack :=MEMORY(D,E); 
INX D ; RBUF(D,E); 
DR B ; RESTOR(D,E,H,L); 
JNZ POPCY  ; END; 
POPCX: POP D ‘ 
XCHG ; 
SHLD LHLI+] ; 
CALL LHLI ; 
SHLD STACK ; 
LHLD LHLI+1_ ; 
CALL RBUF ; 
POP H ; 


160 


ed nonetheless in informal ALGOL. The 
remaining question, perhaps, is one of 
making the stack Q of PARSE operable 
on a microcomputer. To that end, listing 
3 shows a hypothetical implementation 
of single stack contro! routines STACK, 
POPUP, and POPSD using 8080 
assembler format. 

Now what? Well, for a start let’s 
observe that PARSE will work only with 
binary operator expressions. Right2 
Well, not quite. Note that PARSE passes 
the buck for recognition. If INTOKEN 
can recognize unary operators, it can 
also stuff in a dummy operand on the 
fly, since PARSE initializes Position, and 
thereafter leaves it alone. That is, the 
common unary operators are special 
cases of a binary and either Os or 1s: 
NOT FRED is equivalent to ones 
exclusive-OR FRED; NEGATIVE VIBES is 
equivalent to 0 — VIBES; and INVERSE 
HYPOTHESIS is equivalent to 
1/HYPOTHESIS. 


How about the results? PARSE can 
easily be modified to directly generate 
machine-language code if INTOKEN is 
modified to create or at least have ac- 
cess to a symbol table; or its output can 
be used, as is, by an_ interpretive 
calculator program. Obviously, 8080 
machines and, for that matter, most 
microprocessors lack multiply and 
divide instructions, but nonnative opera- 
tions can easily be interpreted as 
Operator subprogram calls. PARSE 
makes no presumption about the com- 
puter on which it is run except the 
availability of a stack to use with its out- 
put referenced by ‘$’. The operators, for 
example, for which PARSE was 
developed in the form shown were 
character string operators of combina- 
tion and proximity. The PARSE output 
was interpreted by a program for search- 
ing large textual files on an IBM 
System 360 disk unit. The point is that 
the results are what you make of them, 
PARSE being no more than a procedure 
for rearrangement of expressions. 

A final apology before getting under 
way. FORTRAN programmers may by 
now have noticed an “error” in that 
although the tokens 1 and H in the exam- 
ple of figure 1 are at the same paren- 
thesis level, the add-1 parse precedes 
the divide-H in the final step. Why? | 
prefer to ask why one bothers with 
operator priorities so long as the desired 
order of computation can be explicitly 
specified by using parentheses. The ex- 
ample of figure 1, in fact, was contrived 
in part to illustrate that PARSE as shown 


presumes a strict left to right 
valuation at any parentheses level. 
erators are not “ranked” as in FOR- 
YAN and several other higher-level pro- 
mming languages. 


More Time 


If the available stack mechanism is 
ly once more generalized, to provide 
ultiple stacks simultaneously, some 
nceptual simplification of a large 
lass of problems occurs. As a near 
vial example, we illustrate in listing 4 
two-stack sorting procedure. In 
ence, it removes records (ie: strings) 
om a file one at a time and 
anipulates the two stacks, Highside 
Lowside, back and forth until the 
record fits in the inclusive interval 
alues bounded by the top elements 
the two stacks. The procedure has 
virtues: 


‘It is easy to describe and understand. 
it requires an absolute minimum of 
fi Bperespace. 


The price we pay is speed. It is prob- 
ly one of the two or three slowest sort- 
ng algorithms around.@ 


The program examples which appear 
this article are written in an informal 
ALGOL type notation. The basic unit of 
_ ALGOL is the statement. It can be 
| either a simple statement such as: 


Position: =0; 


which is read ‘‘position is evaluated as 
_ 0,” or a compound statement defined 
by BEGIN . . . END such as: 


BEGIN 
Q: =PARSE;PARSE: =null; 
END 


_ which is read “‘Q is evaluated parse, 
PARSE is evaluated null.’” 

The statements defined between the 
BEGIN and END statements are not 
' festricted to type. A preceding condi- 

tional such as (IF... THEN. . . ELSE) 
_ will affect the entire command state- 
. One of the constituents of the 


Listing 3 continued 


POP D ; 
POP B i 
POP PSW ; 
sTC ; 
CMC i 
RET ; 


Listing 4: A SORT procedure expressed in informal ALGOL type nota- 


tion demonstrates use of two stacks. 


STRING ARRAY PROCEDURE SORT(File): 


STRING ARRAY File; 
BEGIN 
INTEGER K; 
STRING + This; 


STACK Highside, Lowside; 
Lowside: = File(1); 
Highside: = File(2); 


COMMENT top function references item 


on the top of some stack; 


IF TOP(Lowside) > TOP(Highside) 


THEN BEGIN 
This : = Highside; 
Highside: = Lowside; 
Lowside : = This; 
END; 


COMMENT size function produces the 
current number of elements in array; 


K:=3; 
WHILE KSSIZE(File) DO 
BEGIN 

This : = File(K); 

K :=K+1; 


WHILE This <TOP(Lowside) DO Highside : = Lowside; 
WHILE This >TOP(Highside) DO Lowside : = Highside; 


Highside : = This; 


END; 
WHILE NOT(Lowside = entry) DO Highside : = Lowside; 


K:=1; 
WHILE K<SIZE(File) DO 
BEGIN 

pe = Highside; 


:=K+l1; 
END; 
END. 


statement may well be another com- 
pound statement. For example, to add 
an array of samples having subscripts 1 
through Limit which is specified 
elsewhere we could write: 


BEGIN 
Subscript: = 1; Sum: =0; 
WHILE Subscript<Limit DO 
BEGIN 
Sum: =Sum + Sample(Subscript); 
Subscript: = Subscript + 1; 
END; 
END; 


The WHILE statement’s operand (the 
statements after the DO) rather in- 
tuitively is in execution so long as the 
conditional part (Subscript<Limit) is 
true. 

The CASE statement is simpler in ef- 
fect. It acts approximately like an in- 
dexed jump. It has two operands. The 
first of these (T in the PARSE pro- 


cedure) is an integer, and the second is 
a list of statements bracketed by BEGIN 
and END. The first list whose position 
matches the value of the index 
« Specifier. 

Following are the informal extensions 
that have been made to ALGOL and 
used in the programs: 


@ The period indicates concatenation 
of character strings. Presuming 
values of ‘WHAT’ and ‘STUFF’ for 
symbols A and B, A. B will have a 
value of ‘WHATSTUFF.’ 

© is declared to be of type STACK 
which, however implicit in most im- 
plementations of ALGOL-60, was 
not construed to be explicitly 
available. It is, in effect, a LIFO in- 
dexed character-string array. 

@ Null and empty are used for assign- 

ing values, respectively, of a 

character string of length O and a 

stack having O entries. 


usy work! It’s a terrible thing to in- 
ton people or computers. Wait loops 
input or output operations are busy 
for computers, and unless you 
how to tap your computer on the 
julder when you need it, it will prob- 
pend most of its time doing busy 


hobbyists, we are always con- 

about squeezing the greatest 
out of our investments. We want 
omputers to run as efficiently as 
le. Since it is likely that we will be 
jed in designing and building some 
own input/output (I/O) devices, 
ould develop an understanding of 
cept of interrupts. To efficiently 
im peripherals for I/O purposes it 
necessary to use interrupts. 
article introduces the basic con- 
of interrupts, defines the ter- 
Ology that applies to interrupt 
anisms, and describes the process- 
ents that must occur during the 
8 from the receipt of an interrupt to 
teturn from that interrupt. 


pts 


n excellent example of interrupt pro- 
g is the system used in telephones. 
see why. 

know when someone is trying to 
us because the telephone rings. But 
er how much time would be 
if we had to periodically pick up 


hat Is an Interrupt? 


R Travis Atkins 


the receiver to see if anyone was on the 
telephone if the phone had no bell. This 
periodic method is called polling; it 
works well for telethons and radio talk 
shows. However, it’s not the best 
method for normal home telephone in- 
stallations. Assume you receive an 
average of one or two phone calls a day 
at home. Imagine yourself as a pro- 
cessor and the callers as the I/O requests 
from a keyboard. The order of 
magnitude differences in this example 
are about the same as with your pro- 
cessor and its I/O. The bell on your 
telephone is, of course, an interrupt. It is 
an excellent way to resolve the asyn- 
chronism and speed differentials of the 
telephone communications system. In- 
terrupts can resolve the same fundamen- 
tal mismatches for computers as well. 


Terminology 


Let’s carry this analogy a little further 
to introduce the terminology that refers 
to variations of the basic interrupt con- 
cept. 

If your phone is ringing, and you are 
about to process an interrupt, what are 
your reactions? How does your interrupt 
processing work? 

More than likely, you will perform a 
sequence of actions precisely analogous 
to those your microprocessor performs 
when it receives an interrupt. Figure 1 is 
a flowchart of the typical procedure we 


163 


would run through for a telephone inter- 
ruption. It consists of both the 
housekeeping chores of switching from 
the background task you were doing, 
reading, to the interrupt task, answering 
the phone, and back again in an orderly 
and complete fashion, as well as the in- 
terrupt handler itself. 

In the computer an interrupt is a 
special control signal that is sent to your 
microprocessor when a given asyn- 


INTERRUPT 
OccURS 


FINISH 
READING 
SENTENCE 


HOME. PHONE 

PHONE YES OFF HOOK >“? 

CALL ? 

? 

NES PLACE 
BOOKMARK 


LIFT PHONE 
OFF HOOK 


SELECT 
APPROPRIATE 
RESPONSE 


RETURN TO 
MARKEO PLACE 
IN BOOK 


GO TO NEXT 
SENTENCE 


Figure 1: One example of a way to handle a human being’s interrupt pro- 
cessing in response to a ringing bell. The ringing bell is like a signal ona 
multiple source interrupt line of a computer. The first object is to iden- 
tify the source of the interrupt. If processing is done, the state of the in- 
terrupted process (eg: reading a book) is saved (eg: with a bookmark) 
while the phone is answered. After the phone call, the reading of the 
book may be resumed at the place of the bookmark, restoring the 
original process. 


164 


chronous event, such as a_ switch 
closure, or an I/O ready signal, is 
detected by your system. It is the 
mechanism that forces the processor to 
take note of that exceptional event. 

The interrupt causes your processor 
to transfer control to a set of instruc- 
tions known as the interrupt handler. The 
interrupt handler is nothing more than a 
precoded contingency plan in the form 
of a subroutine that may be called at 
any time in response to an interrupt 
signal. What you do in this subroutine is 
limited only by the software capabilities 
of your processor. 

In microprocessors, interrupt process- 
ing is basically a software technique 
with varying degrees of hardware sup- 
port depending on your particular pro- 
cessor and system. The term vectored in- 
terrupt refers to a simple method of 
reacting to an interrupt. The processor is 
sent to the interrupt handler which will 
lead the processor through steps to 
determine the source of the interrupt, in- 
itiate appropriate actions and return to 
the point of interruption. The vector is 
simply the starting address of the inter- 
rupt handler and is supplied either exter- 
nally or internally, depending upon your 
particular processor's hardware. 

Microprocessor integrated-circuit 
designers who seek to minimize the 
hardware requirements in their pro- 
cessors often assign a fixed location or 
set of locations in the processor’s ad- 
dress space to hold the vector(s). The 
6800 processor uses this approach, as 
does the Texas Instruments TMS-9900. 
Other processors such as the 8080 
receive their vectors directly from exter- 
nal sources, a method which usually in- 
volves more system hardware. 

The built-in process that occurs in 
your microprocessor integrated circuit is 
usually limited to saving the program 
counter and the processor's status 
register, masking subsequent interrupts, 
and then transferring control to your in- 
terrupt handler (ie: loading the program 
counter with the interrupt handler’s 
starting address). The task of determin- 
ing where the interrupt came from is left 
to the interrupt handler itself. In the 
simplest case, where you have only one 
device tied to an interrupt line, the 
origin of the interrupt is implied. Since 
we may at some time have more devices 
than we have separate interrupt lines, 
we should also know how to make a 
more sophisticated system capable of 
handling many devices. 

To see how multiple interrupts are 
handled, consider another analogy. 


\ssume you had just settled back into 
easy chair after finishing with the 
lephone interruption, when suddenly a 
ir of hands covers your eyes and a 
ice says, “Guess who?” You've been 
rupted again, and you don’t know 
h of your twelve children it is, so 
will have to save your place again, 
begin by saying, “Is that you 
len?”... “No.”.. .“Is it you Travis?”. . . 
“No.”...“Is it you Mary Ellen?” 
etc, until you get a positive response. 
much the same way, several devices 
use a single, common interrupt line 
) your processor so that, once the inter- 
handler is initiated, it can inter- 
ate all the devices to see which one 
ent the interrupt signal. To accomplish 
jis, it is customary to have a device 
‘us register in the microprocessor’s 
ress space for each_ individual 
device. The data in this location in- 
ates the device’s current status: busy 
ready. 
, Now suppose this game of guess-who 
ery popular with your children, and 
are all playing it on you, some 
h more often than others. Your best 
rategy would probably be to adopt an 
dering scheme to optimize the han- 
ing of these many interruptions. This 
simply means that you would guess the 
ames of the children who were the 
1ost frequent players first and check 
least likely ones last. Similarly, in in- 
frupt processing you should arrange 
e order of checking the device status 
isters of your I/O units from the most 
iuent source of interrupts to the least 
ent. By ordering your interrupt ser- 
ing this way, you can add significant- 
to the efficiency of your system. 
y now you’ve probably realized that 
idea of letting multiple devices 
are one interrupt line is problematical: 
) devices may want to interrupt the 
cessor at the same time. In terms of 
ir analogy, all twelve of your children 
Y want to play the guess-who game 
ith you at the same time. The way to 
andle this situation is to say, “Hold it! | 
vill play the game with each of you . 
only one at a time.” By doing this 
act on one interrupt while you mask 
jut the rest. 
The concept of maskable interrupts is 
Orporated in many of today’s pro- 
sssors. There is usually a mask bit or 
used to block or mask the interrupt 
| from the processor. This masking 
tequently part of the built-in process 
DN your microprocessor chip to protect 
he function of saving critical informa- 
1, such as the program counter and 


status register, from subsequent inter- 
rupts. Once masked out, your system’s 
design will determine if a subsequent in- 
terrupt will be held pending or lost. In- 
terrupts that are kept pending are often 
referred to as queued interrupts. 
Sometimes circuitry external to the pro- 
cessor chip itself is used to give the 
pending interrupts an order of priority in 
much the same way as you might tell 
your children to line up in the order of 
FINISH 


INTERRUPT 
OCCURS 
CURRENT 


INSTRUCTION 


MASKABLI 
INTERRUPL 
? 


‘MASKABLI 
“Ne 


STORE PROGRAM 
COUNTER AND 
REGISTERS 


MASK OFF OTHER 
INTERRUPTS 


LOAD INTER RUET 
VECTOR INTO 
PROGRAM COUNTER 


EXECUTE 
INTERRUPT 
ROUTINE 


RETURN FRO 


RESTORE 
PROGRAM COUNTERS 
AND REGISTERS 


FETCH NEXT 
INSTRUCTION 


Figure 2: Analogous to the human interrupt processing of figure 1, the 
typical computer's interrupt processing activities are shown by this 
chart. The difference between figures 1 and 2 are largely in the activities 
described in each box; the form of the processing logic in this particular 
set of examples is nearly identical. 


165 


Table 1: The following list of responsibilities must be jointly met by 
both the programmer and the system programmed, if interrupts are to 
be properly handled. The key item to remember is that when the inter- 
upt occurs, the critical data values determining the state of the 
machine must be saved so that at the end of the interrupt process, the 
original process can be resumed as if the interrupt never happened. 


Sensing an interrupt signal and determining 
appropriate response. 

Setting the mask to protect the processor 
from subsequent interrupts. 

Note where you are by saving the program 
counter and status register. 

Transferring processor control by loading the 
Program counter with the interrupt vector ad- 
dress. 

Executing the interrupt handler which may: 


save accumulator(s). 

save index register(s). 
save pointer(s). 

search for interrupt source. 
satisfy device request. 
restore pointer(s). 

restore index register(s). 
restore accumulator(s). 
clear the interrupt mask. 


Resume normal processing by restoring the 
Program counter and the processor status 
register. 


PROCESSOR HARDWARE 


RECOGNIZE INTERRUPT 
MASK OTHER INTERRUPTS. 
NOTE WHERE 


EXTERNAL HAROWARE 


iwrennuprsh LOEWE DECIDE WHICH INTERRUPT 
i 1 
SSSINMTE eS LOAD DEVICE STATUS [_) 
DEVICE REGISTERS. 


VECTOR TO INTERRUPT HANDLER 


SOFTWARE 


DETERMINE ORIGIN OF INTERRUPT 
DETERMINE WHAT EVENT OCCURRED. 
TRANSFER CONTROL TO APPROPRIATE 


BACKGROUND. PROGRAM 


COMPLETION OF 
INTERRUPT 
PROCESS 


Figure 3: A division of the functions at an interrupt between hardware 
and software is detailed in this diagram. The exact boundaries are often 
set by the system’s hardware and software design details, 


166 


youngest to oldest to play the guess-who 
game. The N-level priority interrupt 
capabilities that are mentioned as 
features of microprocessor systems refer 
to this type of interrupt queuing. A 
higher-priority interrupt that arrives 
after several low-priority ones will usual- 
ly bump the lower-priority interrupts 
down in the queue. 

For those cases where an interrupt 
must get the processor's attention right 
away, a nonmaskable interrupt is usually 
also provided in the integrated circuit's 
structure. This control line is for a very 
high-priority function of your choice, 
which can override the maskable inter- 
rupts even if they are in progress. This is 
valuable for very high-speed I/O, such as 
a floppy-disk unit, and for hardware 
emergencies such as fire or power-loss 
routines. Your system reset is usually a 
nonmaskable interrupt. 


Mechanisms 


Now that you have a feel for the ter- 
minology, let’s take a look at the 
mechanisms and processing that are 
common to all interrupt routines. Figure 
2 is a typical flowchart of the functions 
necessary to accomplish the transfer of 
control from the background processing 
to the interrupt handler and return. You 
may think of this as putting the 
background process on hold while the 
interrupt is processed and recommenc- 
ing the background process when it 
returns. The background process is not 
affected by what has happened; thus the 
interrupt processing is completely 
transparent to the background process 
and may be executed at any time 
without fear of disturbing it. The only 
definite change is that the background 
processing will slow down somewhat 
because the processor has to take extra 
time to service the interrupt(s). As a 
result, any real-time clocking in the 
system will be offset by that interrupt 
processing time. 

The joint responsibility of the pro- 
cessor and you, the interrupt routine 
programmer, is to insure that the 
background process is not disturbed. 
These responsibilities are simply stated 
in table 1. 

Within the interrupt handler it is not 
always necessary to save all of the work- 
ing registers for évery interrupt, but you 
must at least save and restore every 
register used in your routine. 

Deciding when to remove the inter- 
rupt protection (ie: clear the mask) is 
your responsibility. The key is to pick a 


which comes after the saving of 
ical registers. The mask shouldn’t 
oved in the middle of your inter- 
routine unless the routine is 
rable. Reenterancy is a term that 
to software routines that find new 
locations to store their working 
‘each time that they are reentered 
they have been exited. The 
ance of this is that if you clear 
mask and a subsequent interrupt ar- 
s, stops your current interrupt pro- 
and begins to use the same in- 
pt routine you were just using, you 
ensure that it does not destroy 
urrent working data. The safest 
edure is to stay masked throughout 
terrupt processing until you 
e experienced with the reentry 
e techniques. Figure 3 shows the 
on of these interrupt processing 
for a typical microcomputer 


system. 

If your processor's monitor was sup- 
plied by the manufacturer, there is much 
to be learned from studying its interrupt 
handler section. Look for the methods 
used to accomplish the basic steps we 
have outlined above, then write your 
own simple interrupt handler, modify 
the interrupt vector to point to your 
routine instead of theirs, and execute 
your interrupt handler. 

Once you have done this successfully, 
you will have developed an appreciation 
of how the modern digital computer, 
large or small, appears to simultaneous- 
ly service the requests of so many 
peripheral devices. Understanding inter- 
rupt driven processing, which is the cen- 
tral concept of computer operating 
systems, will help you to grasp the 
awesome power that lies within your 
own personal computing system. 


167 


While talking with fellow enthusiasts 
attending meetings of computer clubs, 
seem to be several aspects of 
all computer systems which are par- 
ularly confusing to newcomers to the 
aby. One of these is interrupts. This 
le explains how the mechanisms of 
errupts work, and what can be done 
vith them in a personal computer 


‘When computers first came into wide- 
ead use, they ran primarily on card or 
batch principles. The operator had 
lists of instructions telling him 
ich card decks to use to run the 
ific jobs. Each job had to be set up 
ependently, which was acceptable as 
g as this setup time was short in rela- 
n to the amount of time each job ran. 
desired goal was to keep the 
ine running as much as possible. As 
nology advanced and job run times 
same shorter, setup time became a 
nificant fraction of the total job run 
It was clear that if the machine 
take over some of the chores of 
Operator, but at machine speed, the 
lization of the system could be in- 
ed. 
ounting and setup procedures 
uld be accomplished by programs 
d inside the machine, and then the 
mputer could request the operator to 
form only those duties that actually 


A Little Bit 
on Interrupts 


Robert R Wier 


required human intervention (eg: mount- 
ing a disk pack). Thus programs called 
operating systems came into use. 

About this same time, it was realized 
that if such a machine were going to run 
jobs under an operating system, there 
had to be some way to return control to 
the operating system should the pro- 
gram encounter difficulties. That is, the 
operator should be able to jerk control 
of the machine away from the program 
currently running and give it to the 
operating system without having to go 
through the process of clearing the 
machine and reloading the operating 
system manually each time. 

Another problem emerged at this time 
with the fact that as the central process- 
ing unit improved in efficiency due to 
the faster technology, the devices used 
for input and output, called peripherals, 
remained at about the same speed. 
Therefore, if the central processing unit 
had to wait for the completion of an in- 
put or output operation, it would just sit 
there testing and retesting to see if the 
program could proceed. This was fre- 
quently called a busy wait loop, or spin 
lock. It is a technique which is still fre- 
quently used in microprocessor systems. 

Clearly, since input/output (I/O) 
operations were so slow, it would be 
nice if the processor could simply re- 
quest the I/O hardware to input to or 
output from memory directly without 
processor intervention. Then the pro- 
cessor could go on and perform useful 


169 


computations while the I/O operation 
was in progress. Of course this required 
considerably more sophisticated in- 
? put/output hardware than was in use 
previously, when the processor or- 
chestrated every data transfer. But since 
the I/O hardware didn’t need to be able 
to perform complicated arithmetic func- 
tions, it could be regarded as a minicen- 
tral processing unit or microprocessor. 
Indeed, the original purpose of the 
microprocessor chip which has made 
our hobby possible was to produce 
reasonably smart peripheral systems at 
low cost. That is, each I/O channel 
would have its own smaller processor 
to handle only data transfers be- 
tween an I/O device and memory. A lit- 
tle thought will reveal a problem, 
however. If the processor simply starts 
an I/O operation and then pursues other 


EXTERNAL LOGIC 


INTERRUPT TRANSFORMS 8080 
STEP! * INTERRUPT INTO PROCESSOR 
SOURCE "RST" INSTRUCTION 


REQUEST 


MEMORY ADDRESS SPACE 


matters, how does it know when the 
operation is finished, so that it may use 
the data input or refill the buffer just 
output? What if the drive mangles the 
tape and the data has to be output or in- 
put again? What was needed was the 
ability for the I/O processor to be able to 
tap the central processor on the 
shoulder and say, “I’m finished,” or “| 
fouled it up.” 

There was also the problem of real- 
time applications which, depending on 
the system, needed the computer to be 
able to detect some condition, make a 
decision, and act on it quickly. If you 
had a big busy wait loop, where several 
instructions were executed in the loop 
between each checking of the status of 
each separate input signal, your 
refinery’s catalytic converter might go 
critical before the computer even 


RETURN 
POINTER 
ONLY 

GOES TO 
STACK AT 
INTERRUPT 


Figure 1: What happens when an 8080 interrupt occurs? The interrupt signal occurs first at some external device. Then, 


external circuitry creates an interrupt signal and sends a restart (RST) instruct 
a third step the old program counter information is saved on the stack to all 


ion to the processor as the second step. As 
low later return. Then the fourth step, part 


of executing the RST instruction, is to jump to one of eight possible restart locations in the first 64 bytes of memory ad- 
dress space; if the-8 bytes are not sufficient, step 5, shown here, is a jump to an extension of the interrupt routine. 


Responsibility for saving the state of the processor (beyond the return 


from subroutine pointer pushed automatically 


into the stack by RST n) is up to the programmer coding the interrupt response routine. 


170 


ed to see if something was wrong, 
quieting development. 


Interrupts Do 


, interrupts were devised. Indeed, 
e computer scientists feel that the 
yjor difference between the second 
third generation machine was not 
the transition to integrated cir- 
, but the advent of the interrupt- 
machine as well. But exactly what 
ens when something from the out- 
world or a condition internal to the 
cessing wants attention? 
ppose the processor is hardwired 
at least one interrupt line, and 
ably more. When an interrupt oc- 
the desired effect is to: 


Store all the information regarding 
e presently running program which 
necessary to resume execution at 
same point some time in the 
re, to prevent having to start it 
ain from the beginning. This in- 


is on, then that device “did it.” 


WIRED "OR" INTERRUPT LINE 


“| DIDN'T DO IT!" 


"1 DIDN'T DO IT!" 


"1 DION'T DOIT!" 


cludes the program counter, any 
status information, and, optionally, 
the processor registers. This state- 
saving activity must be complete or 
unpredictable behavior can ensue 
upon return to the interrupted pro- 
cess. 

@ Insert into the program counter the 
address of the first instruction in the 
interrupt program which will handle 
the condition causing the interrupt. 
When the interrupt routine is finished, 
the status register(s), program 
counter, and processor registers of 
the interrupted program may be 
restored and the interrupted program 
resumes running without being aware 
that it was temporarily not in control 
of the processor. This process of 
restoring the machine state is the in- 
verse of the state-saving activity. 


Interrupt Hardware 


The actual hardware included to ef- 
fect interrupts varies somewhat from 


“WHODUNIT” 
INPUT PORT 


t="! DID IT!" 


COLOR KEY: 


DEVICE NO.I, THIS CASE 


10 IGNORED THIS CASE 


@ 2: The “Who Done It” problem on interrupts. Some means must be provided to determine which I/O device re- 
J service when more than one device shares an interrupt line on any processor. Here is one way of determining 
one it”: The input port “WHO-DUNIT” looks at 8 single-bit status flags corresponding to up to eight mares: it 


="! DIDN'T DO IT!" 


IO ACCESSED BY INTERRUPT OF 


172 


one processor to the next. Virtually all 
of them save the old program counter in 
some specified location and insert the 
address of the interrupt handler’s first in- 
struction into the program counter. This 
is an unconditional branch to a 
subroutine with linkage for return after 
interrupt processing. Each machine is 
different, though, in the actions taken 
beyond these two basic functions. In the 
IBM 370 series, the hardware does prac- 
tically everything for the programmer. 
In microprocessors, the software inter- 
rupt program must do some of the things 
that the hardware does in the larger 
machines. Let’s look at the most popular 
microprocessors and see what they do. 


Interrupts on the Intel 8080 


When an interrupt request is received, 
the 8080 completes the current instruc- 
tion before taking any action on the in- 
terrupt. Virtually all miniprocessors and 
microprocessors do this, since there 
would be all sorts of problems en- 
countered if an interrupt were recog- 
nized in the middle of the execution of 
an instruction. A little thought will show 
why. The 8080 does not increment the 
program counter. The program counter 
for the old program is pushed (ie: saved) 
onto the stack. The next instruction to 
be executed is jammed onto the data 
bus by external interrupt circuitry and is 
called the restart instruction. Depending 
on the restart instruction operand, the 
next instruction executed (ie: the ad- 
dress placed into the program counter) 
may be one of eight possible decimal 
memory locations: 0, 8, 16, 24, 32, 40, 48, 
or 56. 8080 programmers will note that 
there are just enough memory locations, 
eight, between these addresses to save 
the registers of the old program, disable 
further interrupts, and execute a jump to 
another location, which in this case will 
be the interrupt service routine. This en- 
tire operation is explained in figure 1. 

Obviously, if you ever contemplate 
using all eight classes of interrupts, you 
should be sure not to program using the 
first sixty-four memory locations since 
those are reserved by the hardware for 
interrupt handling. But what if you want 
to have only one class of interrupt? For 
example, you have a panel switch that 
you can push to get the attention of the 
machine. Then just program the par- 
ticular location that you, or the com- 
puter hardware designer, hardwired in. 
Suppose for a minute that you need 
more than eight interrupts. It is possible, 
within a few restrictions as shown in 


figure 2. Just OR the interrupt request 
lines from the outside world together 
and feed them to the same interrupt line 
going into the processor. But then how 
do you know which device has caused 
the interrupt? Obviously there will have 
to be another signal somewhere to in- 
dicate which device needs attention. 
This could be implemented in a variety 
of ways: 


@ The device could place an identifying 
number on the data bus which would 
identify the device. 

@ An input port could be wired so that 
the device would signal that it needed 
attention. 

@ The processor could send an inter- 
rogation to each device connected to 
that interrupt line asking if it was the 
one that sent the request. 


The first and second methods are 
faster since the device number or input 
data could be used as an index to go to 
the appropriate interrupt handler pro- 
gram. The third method is called polling 
and may be somewhat time consuming 
if many devices use the same interrupt 
line. Because so much of the interrupt 
logic of the 8080 is external to the chip, 
there can be considerable variation. 
Most 8080 systems use a simple restart 
(RST) operation code, but any instruc- 
tion including jump (JMP) or call (CALL) 
can be used with appropriate external 
logic. 


Motorola 6800 Interrupts 


This chip has the capability of 
decoding and servicing a smaller 
number of interrupts, but in a more 
automatic way than the 8080. The 6800 
uses an indirect, vectored interrupt 
situation in which each source of an in- 
terrupt looks up a unique vector loca- 
tion for the address of its service 
routine. When an interrupt is indicated 
to the 6800 by one of three possible 
sources, the processor automatically 
saves the two accumulators, index 
register, status register, and program 
counter on the stack, and in the process 
of doing so changes the stack pointer. 
Thus, the 6800 has the advantage of 
never requiring program code to achieve 
state saving functions. It simultaneously 
has the disadvantage of always perform- 
ing a complete state save so there is no 
way to “cut corners” and save time by 
ignoring the saving and restoring of data 
which is not changed by the interrupt 
routine. This vectoring method also has 


disadvantage of requiring that the 
k pointer never be used for other 
poses, such as a_ pseudo-index 
ster, when interrupts are possible. 
e three interrupts possible on the 6800 


Maskable Interrupt (IRQ). This inter- 
rupt occurs when a hardware signal 
causes a low state on the IRQ line of 
the processor. This line is always 
wired in a wired or configuration 
when multiple sources are used, so 
some form of polling or priority logic 
is needed to identify sources. When 
an interrupt occurs, a flag is set in the 
processor that prevents a second in- 
terrupt from interrupting the routine 
which processes the first to arrive. 
NonMaskable Interrupt (NMI). This in- 
terrupt is identical to the IRQ inter- 
_ rupt except that no masking of 
repeated interrupts occurs in the pro- 
cessor to prevent conflicts. As a 
result, without external logic to do 
the masking, only one_ interrupt 
source should be dedicated to this 
ignal. Motorola intended this line to 
be used with the absolute highest 
Priority external signal in a typical 
system: the signal that indicates a110 
VAC main power supply failure in a 
dedicated application system. The in- 
rrupt response routine in such a 
case typically would have enough 
time before the capacitors of the 
power supply discharge to save the 
_ state of the processor and prepare for 
later return of power. But the intend- 
ed use does not mean the only use, 
and with proper care this interrupt 
; line can be used for inputs as diverse 
_ as a direct memory address (DMA) 
controller or real-time clock. 
Software Interrupt (SWI). This inter- 
Tupt occurs when a program executes 
a software interrupt instruction. The 
actions taken are exactly the same as 
those for the totally asynchronous 
NMI and IRQ hardware inputs. The 
only difference is that the SWI is not 


point. Whereas an interrupt, such as 
NMI or IRQ, can occur at any time 
felative to the execution of a pro- 
gram. Thus the SWI instruction is a 
" call to an interrupt subroutine, with 
feturn implemented via a return from 
interrupt (RTI) instruction. 


@ is one further method of interrupt- 
4 process in the 6800 that is not 
aracterized by the state saving need- 


ed to effect a true interrupt-style action. 
This is use of the reset (RES) line of the 
hardware. This form of interruption 
merely causes an unconditional branch 
to a restart location and is typically used 
to initialize the system or to recover 
from disastrous errors. 

All four sources of interruption of the 
6800 processor, IRQ, NMI, SWI, and 
RES, use a similar, indirect, vectored ap- 
proach to locating the address of the 
desired routine. In the cases of IRQ, 
NMI, and SWI the desired routine is a 
subroutine which returns via an RTI in- 
struction; in the case of RES the desired 
routine is the beginning of the software 
which gains control when the processor 
is restarted, 

In each case, the processor uses a 
2-byte address stored in the region from 
hexadecimal address FFF8 to FFFF in 
memory address space as the starting 
address for the desired routine. Thus, for 
example, suppose a source of an inter- 
rupt changes the state of the IRQ line, 
causing an IRQ interrupt. The processor 
first completes the previous instruction, 
as noted earlier. Then, instead of ex- 
ecuting the next instruction, it executes 
the details of the built-in state-saving se- 
quence. After state saving, the processor 
sends out address to memory for loca- 
tion FFF8, from which it obtains the high- 
order address of the interrupt routine. 
Then it sends out the address FFF9, from 
which it obtains the low-order address of 
the interrupt routine. It then branches to 
the interrupt routine at the address just 
obtained. A similar process occurs for 
the NMI response using the data con- 
tained in locations FFFC and FFFD as an 
address; for the SWI response using data 
contained in locations FFFA and FFFB as 
an address; and for the RES response us- 
ing data contained in locations FFFE and 
FFFF as an address. 


The MOS Technology 6502 


This 8-bit processor is very similar to 
the 6800 in its processng of interrupts. 
There is no separate vector for a soft- 
ware interrupt as implemented in the 
6800, so the 6502’s interrupt vector 
region only includes nonmaskable inter- 
rupts (eg: FFFA and FFFB contain the ad- 
dress), reset (eg: FFFC and FFFD contain 
the address), and maskable interrupts 
(eg: FFFE and FFFF contain the address). 
The 6502’s BRK instruction is similar to 
the 6800’s SWI, except it uses the same 
vector location as the maskable inter- 
rupt IRG, rather than a separate address 
vector. 


173 


Interesting Uses The use of interrupts for I/O opera- 
tions probably will not be a major ap- 


Now. knowing about interrupts, what plication except in cases of direct 
are their uses on the personal computer memory access or fast peripherals. Per- 
system, and what kinds of programming sonal systems tend to be strongly 
should we use with them? Probably a oriented to a memory conservative type 
majority of users will not need to use in- of programming, since the cost of the 
terrupts at all, at least until they have processor hardware is so low to begin 
several years programming experience. with, and the slowness of I/O is not real- 
If you have an 8080, just be careful to ly a significant factor. 
write your programs around the critical Real-time applications are likely to 
interrupt locations in low memory ad- abound in small systems. The timers that 
dresses, in case sometime in the future are included in some systems often 
you decide to start using them. If you operate by allowing the program to load 
have a 6800 and use a dedicated a desired number, which is then counted 
monitor such as JBUG of MIKBUG, down (or added up, depending on the 
much of your freedom to use interrupts hardware) independent of the processor. 
is replaced by hardwired response vec- When 0 is reached, the timer can 
tors in ROM found at FFF8 to FFFF. generate an interrupt. This could be 
Almost certainly if you plan on writing useful in such applications as keeping 
or using some type of operating system, track of how long programs use the pro- 
the interrupt facilities will need to be cessor, allowing a player a limited 
used in the interrupt routines. amount of time to make a move in 


MEMORY ADDRESS SPACE 


PROCESSOR STACK 


FFFE,FFFF 


FFFC, FFFD 


FFFA,FFFB 


USE DATA IN 
IRQ LOCATION 

TO CALL : F 

1RQ ROUTINE i} FFF8, FFF9 


COLOR KEY: 


MEMORY WHICH CAN BE : 
ROM (AND OFTEN IS) STEP 3: 


STEP 2: 
PUSH OLD PROCESSOR 
STATE INTO STACK 


POINT 
IRQ LOCATION 


MEMORY WHICH MUST BE 
PROGRAMMABLE SCRATCHPAD 


6800 


| LINKAGES DURING STEP I: EXTERNAL PROCESSOR 
IRQ INTERRUPT INTERRUPT EVENT =r 
— INPUTONIRQ = 


Figure 3: The 6800 processor's interrupt structure. This vectored interrupt method starts with an interrupt signal to the 
processor. In this example, IRQ occurs, so the processor generates a reference to the IRQ vector location at hex- 
adecimal FFF8 and FFF9. The two-byte address at the IRQ vector location in turn points to the IRQ routine somewhere 
else in address space and as the last step in the process, the routine is called. As part of the special interrupt routine 
call, the old state information is pushed onto the stack. 


174 


ames like Star Trek, generating time of 
day applications, and so on. A very in- 
sresting real-time application of inter- 
upts is in the use of light pens on 
illoscope graphics displays. This is 
use of computers that many hob- 
ists, upon seeing it operate for the 
it time, feel it just this side of magic. 
ctually, when you consider how the 
scilloscope display is generated, the 
hanism is very straightforward. You 
deduce that the computer, or I/O 
vice, must know where the beam of 
is currently positioned on the 
e’s screen, or else it would be just a 
bled mess. Therefore, if a photosen- 
device is placed close to the 
en, an interrupt may be generated 
2n the light beam strikes the cell. This 
iterrupt may cause the location of the 
im to be noted by storing the current 
a in the counters used to control 
beam. 
nother extremely interesting ap- 
tion is the emulation of hardwired 
tions. If the processor allows soft- 
or illegal instruction interrupts, 
software routines may be pro- 
nmed to produce the same effect as 
desired instruction had actually 
included in the silicon on the chip. 
example, suppose that you fre- 
Y needed an instruction which 
, for some unfathomable reason, 
contents of all the registers and 
jut them to a teleprinter. You could 
p a subroutine in each program that 
ed this action. But if you found 
you needed this instruction fre- 
ntly in every program you ran on the 
ine, another way of implementing 
tine would be to place into the 
Program’s code something to 
an interrupt. 
interrupt would cause the inter- 
; Foutine to determine which action 
esired, execute it, and then resume 
errupted program. Of course, the 
stion would be executed much 
lowly than if hardwired. Once the 
ne was finalized, it could be burned 
2ad-only memory, and from then 
vould always be available for the 
mmer’s use. 
actual bit pattern inserted into 
fogram to cause the interrupt 
with the processor. If there are 
ented operation codes, then 


you may simply choose one and use it to 
signify the new operation from then on. 
If unimplemented operation codes do 
not exist or if they cause the machine to 
“hang up” and not interrupt, then a soft- 
ware interrupt, called a supervisor call 
on the IBM 370, may be used. 

This is somewhat less pleasing, 
however, since the code on the program 
listing will always look the same (ie: a 
software interrupt) and make debugging 
a bit more difficult. The 6800’s SWI in- 
struction with its separate vector is 
ideally suited to this use. Obviously, a 
byte would have to be stored 
somewhere, signifying to the interrupt 
routine which operation was desired. In 
a 6800 this would be accomplished by 
following the SWI instruction with the 
appropriate 1-byte code and modifying 
the stack so that RTI returns control 1 
byte past’its normal point of return. 

It is possible to reproduce a particular 
machine’s entire instruction set on 
another entirely different machine in 
this manner. This is frequently called 
emulation, although the term is also 
used to describe this process being ac- 
complished by microcode which, con- 
fusingly enough, is only remotely 
related to microprocessors. 


Conclusion 


We have seen that the use of inter- 
rupts allows computers to become more 
versatile than when they are dedicated 
to one program. Interrupts allow the 
machine to interact with the outside 
world, while at the same time allowing it 
to pursue its own interests. Interrupts 
are useful for accomplishing things in 
ways which, while perhaps more dif- 
ficult to program initially, may be worth- 
while in the ease of application.™ 


REFERENCES 


1. Intel 808 Microcomputer Systems User's 
Manual. \ntel Corporation, Santa Clara CA, 
July 1975. 

2. M6800 Systems Reference and Data Sheets. 
Motorola Semiconductor Products Inc, 
Phoenix AZ. 

3. MCS6500 Microcomputer Family Programming 
Manual. MOS Technology, Norristown, PA. 


175 


William B Noyce 


Whatever size computer one works 
h, there is usually pressure to make it 
rform a given task in less time or less 
mory. Optimization techniques are 
thods for accomplishing such speed 
memory improvements. Usually the 
t effective changes to a program are 
‘orithmic changes. These are changes 
the strategy the program uses to 
achieve its result. An algorithmic change 
teduce the time a program takes to 
by 50% to 90%. For example, using 
well-known quick sort or heap sort 
ead of a bubble sort to sort long lists 
have this effect. 

Sometimes, however, significant 
results can be achieved by coding 
ges, in which the modified program 
essentially the same thing as the 
ious version, but in a better way. 
st compilers perform optimizations 
is type, such as keeping in a register 
expression whose value is used more 
once, rather than recomputing it 
ever it is needed. Coding changes 
exploit simple mathematical or 
cal identities. 

This article follows through the step- 
lep processes used to reduce by 
ut 25% of the time and space taken 
a small subroutine. The example 
routine is the “Novel 8 Bit 
plication” by Christopher D 
eser (July 1977 BYTE, page 142), that 
Produced in listing 1. 

Oding changes are not effective at 
ing the time taken by a program 


unless they are applied to the most 
heavily used parts of the program. If 
some part only accounts for 2% of the 
time used by the program, no optimiza- 
tions applied only to this part can speed 
up the program by more than 2%. Usual- 
ly, the most heavily used parts of a pro- 
gram are inside commonly used 
subroutines or deeply nested loops. 

The eight instruction loop starting at 
LOOP in listing 1 acounts for about 80% 
of the time in the multiply subroutine. 
The loop works by testing, from right to 
left, bits of the number passed in C, and 
adding the number passed in D to the 
appropriate position on the partial pro- 
duct. The number whose bits are tested 
is called the multiplier, and the number 
that is added is called the multiplicand. 
The partial product is kept in HL, 
because it accumulates a 16-bit sum. 
Since the multiplicand needs to shift 
left, it is kept in DE, and XCHG instruc- 
tion moves it into HL to be shifted. If we 
can eliminate the need to shift the 
multiplicand we can save the XCHG in- 
structions and a little setup code. 

The original loop computes: 


2Po + 2'P, + 22P, + 2°Ps + 24P,+2°Ps + 2°Py+2°P, 
where P, = the multiplicand if bit i of 
the multiplier is 1, or 0 if the bit is 0. This 
expression is equivalent, by the 
distributive law, to: 


(((P,*2 + Po)e2 + Ps)o2 + Pa)e2 + Ps)o2+Pz)o2+Pi)e2+Po 


177 


O 3 MULTIPLY C BY De GIVING AC. 
1 3 PRESERVES DrEsHels 
23 
3 F ORIGINAL PROGRAM 3Y CHRISTOPHER D GLAESER 
4G 
33 
6 % 27 BYTES ABOUT 512 CYCLES. 
Lah 
8 
0000 Ds 9 MULT1: PUSH D 3 SAVE REGISTERS 
ooo es 10 PUSH H 
on02 SA " Mov E4d 3 MOVE MULTIPLICAND TO DE (LOWER 
0003 1600 12 MVE 060 3 AND CLEAR DE (UPPER) 
0005 6a 13 MOV LoD 3 CLEAR PRODUCT AREA 
0006 62 146 MOV Hed 
0007 D6u8 1s VI 368 # LOOP COUNTER 
0009 79 18 MOV Ae @ PUT MULTIPLIER WHERE IT CAN SHIFT 
000A 1F 18 LOOPT: RAR # TEST NEXT MULTIPLIER BIT 
9003 o2uFI0 19 JNC SKIPT 
OO0E 19 20 DAD D $ ADD MULTIPLICAND 
O0OF £8 21 SKIP1: XCHG 3 SWAP DE WITH HL 
0019 29 22 DAD H # SHIFT MULTIPLICAND LEFT 
0011 €8 23 XCHG 3 SWAP BACK 
0012 95 246 per 8 
0013 ¢20a00 2s JNZ LOOPY ¢ REPEAT & TIMES 
26 
0015 4% 27 MOV B+H 3 MOVE PRODUCT TO BC 
0017 40 28 MOV Cot 
0018 £1 29 POP H 3 RESTORE REGISTERS 
0019 oF 30 pop o 
001A co ” Ret 3 RETURN 
32 
33 $ EJECT 


Listing 1: The starting point for this case study in optimization is a 
routine by Christopher D Glaeser, which appeared in July 1977 BYTE on 
page 142. This listing reproduces Christopher’s multiplication 
algorithm, which takes 27 bytes of memory and about 512 cycles of the 
processor clock. 


34 ¢ MULTIPLY C BY Ds GIVING BCy 
35 3 PRESERVES DeEsHol. 


37 ¢ SHIFT PRODUCT INSTEAD OF MULTIPLICAND. 
38 7 SHIFT MULTIPLIER LEFT INSTEAD OF RIGHT. 


40 25 BYTES» ABOUT 448 CYCLES. 
3 


0013 05 43 MULT2: PUSH D 3 SAVE REGISTERS 
ootc €S as PUSH H 
001d 5A 45 MOV E+d 3 MOVE MULTIPLICAND TO DE (LOWER 
OO1E 1600 46 RVI 0-0 3 AND CLEAR DE (UPPER) 
0020 6a 47 mOV LoD % CLEAR PRODUCT AREA 
0021 62 48 MOV Hed 
0022 0608 49 AVI 3-8 3 LOOP COUNTER 
0024 79 50 MOV Ase 2 PUT MULTIPLIER WHERE IT CAN SHIFT 
31 
0025 29 52 Loop2: DAD H 3 SHIFT PRODUCT LEFT 
0026 17 33 RAL 3 TEST MEXT MULTIPLIER BIT 
0027 o2csI0 34 INC SKIP2 
0020 19 35 DAD D 3 ADD MULTIPLICAND 
0023 0s 56 SKIP2: OCR B 
Ooze c22590 5? INZ LoOP2 2 REPEAT 8 TIMES 
58 
OOzF 46 59 MOV B+H 3 MOVE PRODUCT TO BC 
0030 40 60 MOV Cok 
0031 €1 61 POP H ¢ RESTORE REGISTERS 
0032 01 62 PoP D 
0033 co 63 rer 3 RETURN 
64 
65 $ EJECT 


Listing 2: By rearranging the code of the inner loop so that an equivalent 
operation is performed, some time can be saved. This, for an 8080, in- 
volves changing the order of shifting of the multiplier and using a dou- 
ble precision addition operation as the equivalent of a shift. This 
modified routine takes 25 bytes and executes in about 448 cycles. 


178 


This latter expression shows how we can 
shift the product left after every addi- 
tion except the last, if we always add the 
multiplicand into the lower byte. !f we 
added the multiplicand into the upper 
byte, we could shift the product to the 
right after every addition, but the 8080 
has no 16-bit right-shift instruction. The 
change to shift the product left requires 
that we examine the leftmost multiplier 
bits first: the new inner loop appears in 
listing 2. Note that the product is shifted 
at the beginning of the loop; this is so it 
does not get shifted after the last time 
through. 

Since the product is shifted left eight 
times, there is no need to clear its upper 
half initially with a MOV H, D instruc- 
tion. Whatever garbage is in H will be 
shifted off and have no effect on the 
subroutine’s result. But we can put these 
unused bits to work instead of wasting 
them. After n times through the loop 
there are 8 — n bits remaining in the 
multiplier and 8 — n unused bits in H, 
since the partial product occupies only 8 
+ nbits. The product and multiplier can 
coexist peacefully in HL, and every time 
the product is shifted, a bit of the 
multiplier falls out into the carry. We 
can thus eliminate the RAL instruction 
which shifted multiplier bits into the 
carry. The new subroutine appears in 
listing 3. 

Where else can we save time or 
space? The user of the original 
subroutine obviously did not care 
whether the input values of registers A 
and C were preserved, but we do not use 
the registers in the loop. We can, 
however, save time and space by using 
these registers. Instead, program 3 uses 
B, D and E, and saves input values of D 
and E on the stack. Register pair DE was 
used as the multiplicand because we 
needed to use XCHG; DAD H; XCHG to 
shift it, but we no longer shift the 
multiplicand. Because we want our 
multiplicand in the lower byte of a 
register pair, we should use the number 
passed in C as the multiplicand and the 
number passed in D as the multiplier. 
This is legal because of the com- 
mutative law. Effectively, we save the 
MOV E, D which moves the multipli- 
cand to the lower half of its register pair, 
and other instructions are changed. We 
can keep the loop counter in A, which is 
no longer needed for the multiplier. 
Now the subroutine no longer modifies 
D or E, so the PUSH D and POP D in- 
structions may be deleted. The savings 
in stack space may or may not be impor- 
tant, depending on other parts of the 


es 


MULTIPLY C AY Dy GIVIAG AC. 
# PRESERVES DeEsHol. 


THE GREAT REGISTER SHLFFLE. SULTIPLICAND IS IN AC 
MULTIPLIER IN He AND LOOP COUNTER IM A. OE IS NOT USED. 


20 AYTESs S90UT 385 CYCLES. 


ones €5 103 mMULTG: PUSH H F SAVE REGISTERS 
oC OSu0 106 wv 340 # CLEAR MULTIPLICAND HIGH BYTE 
NO4F 54 105 mov LB 3 CLEAR PRODUCT Low aYTE 
OCF 62 106 vov 4e0 # MOVE MULTIPLIER TO HIGH PRIDUCT AREA 
9950 3eV0e 197 Mur 408 # LOOP COUNTER 
198 
nos2 24 109 LonP4: DAD 3 SHIFT MULTIPLIER AND PRODUCT LEFT 
005% D25/u0 110 INC SKIPG 3 TEST A WULTICLIER BIT 
056 OF mt Dad 3 # ADD MULTIPLICAND 
0057 30 112 SKIPGs DCR A 
MPSA C252Ju 113 Jnz LOOPS ¢ REPEAT B TIMES 
116 
0058 46 115 wov 3-H ¢ MOVE PRODUCT TO ac 
005¢ 40 116 MOV Cot 
rO5D E17 117 POP 4 # RESTORE REGISTERS 
nese cy 118 RET # RETURN 
119 
120 $ EsECT 


65 MULTIPLY € BY DB, GIVING aC, 
67 & PRESERVES DeEsHol. 
68 F 
69 ¢ KEEP MULTIPLIER AND PRODUCT TOGETHER IN HLe 
79 23 BYTES» ABOUT 411 CYCLES. 
73 
72 
CO3 o> 73 PULT3: PUSH D # SAVE REGISTERS 
O0%S E> 7% PusH H 
7035 5a 75 rov ED 3 MOVE MULTIPLICAND TO DE (LOWERD 
937 touu 76 rVI 0-0 3 AND CLEAR DE (UPPER) 
9039 6A 77 rov Ld 3 CLEAR Low PRODUCT AREA 
CO3A 61 78 Mov Hee # MOVE MULTIPLIER TO HIGH PRIDUCT AREA 
PAS Novo us MVE 308 # LOOP COUNTER 
DN3n 29 81 LOOPS: DAD H # SHIFT MULTIPLIER AND PRODUCT LEFT 
NOSE De4euu 82 INC SKIPS 2 TEST A MULTIPLIER SIT 
oer 19 83 DAD D # ADD MULTIPLICAND 
074? 03 Re SKIPS: DCR 3 - 
N43 C2 Suau 8s JNz 29093 2 REPEAT 8 TIMES 
RS 
9064 be 87 MOV 30H 3 MOVE PRODUCT TO BC 
0747 60 88 MOV Cob 
7043 €1 ao POP H 3 RESTORE REGISTERS 
nee? 01 20 PoP D 
ae Cy 1 RET 3 RETURN 
92 
93 S EIFCT 


4: By doing “the great register shuffle,” further improvement can be accomplished by passing parameters in 
s. This version chips away at time requirements and requires only 385 cycles, with 20 bytes of code. 


121 3 MULTIPLY C 3Y Ds GIVING BC, 
122 ¢ PRESERVES Dr EsHol. 
123 3 ‘ 
126 ¢ LOOP IS PARTIALLY UN@CLLED. 
125 3 
126 ¢ 28 AYTESs AZOUT 325 CYCLES. 
127 3 
128 
O0SF E> 129 MULTS: PUSH H 3 SAVE REGISTERS 
060 Oouu 130 evi 3-0 # CLEAR MULTIPLICAND HIGH BYTE 
ON62 466 131 POV L+B # CLEAR PRODUCT LOW BYTE 
0063 62 132 MOV Hed 3 MOVE MULTIPLIER TO HIGH PRIOUCT AREA 
1066 3Eus 133 WvI 466 3 LOOP COUNTER (HALF NORMAL SIZE) 
136 
ones 29 135 LOOPS: DAD H 3 SHIFT MULTIPLIER ANO PRODUCT LEFT 
006? D2oslu 136 SNC SKIPSA 3 TEST A MULTIPLIER BIT 
M064 09 137 OAD 3 3 ADD MULTIPLICAND 
ones 27 138 SKIPSA: 040 H 3 => REPEAT LOOP AGAIN -- ’ 
O06c o27uUU 139 INC SKIPSBE 
nner OF 140 pad 3 
00?) 30 147 SKIe5B: DCR A 
0071 c2e60u 142 JNZ LOOPS. 2 REPEAT & TIMES 
x 143 
Py 0074 44 146 MOV 3-H 3 MOVE PRODUCT TO BC 
6075 40 165 FOV Cot 
0076 £1 166 PoP H # RESTORE REGISTERS 
no77 co 147 RET 
148 
149 S EJECT 


5: After virtually exhausting straight-forward improvements of the looping methods, the only further im- 
nts possible come from unrolling the loop into larger amounts of program memory. This version pattially 
} the multiplication loop, takes 28 bytes of memory and about 325 cycles. 


179 


Loc 


MOVE MULTIPLIER TO PRODUCT AREA HIGH BYTE 


oas sea SOURCE STATEMENT 
150 ZMULTIPLY C BY Ds GIVING BC. 
151 3 ORESERVES DeEsHole 
152 3 
153 # LOOP IS FULLY UNROLLED. 
154 3 
155 3 69 BYTES» &80UT 258 CYCLES. 
156 3 
157 
—5 158 MULT6: PUSH H 3 SAVE REGISTERS 
060u 159 wvI 360 # CLEAR MULTIPLICAND HIGH SYTE 
66 169 MOV LoB 3 CLEAR PRODUCT AREA LOW 3YTE 
62 161 MOV HD 3 
162 
29 163 DAD H 3 SHIFT MULTIPLIER AND PRODUCT LEFT 
p28299 164 INC SKIP6R 3 TEST A MULTIPLIER SIT 
19 165 DAD D 3 ADD MULTIPLICAND 
29 166 SKIP6A: DAD H 3 > REPEAT LOOP 7 MORE TIMES -= 
026790 167 4NC SKIP6B 
19 168 DAD D 
29 169 SKIP6B: DAD H 
descoo 170 SNC SKIP6C 
19 171 DAD D 
29 172 SKIP6C: DAD H 
029100 173 SNC SKIP6D 
19 176 DAD D 
29 175 SKIP6D: DAD H 
029639 176 SNC SKIP6E 
09 177 DAD 3 
29 178 SKIPGE: DAD H 
029800 179 INC SKIPOF 
09 180 DAD 8 
29 181 SKIP6F: DAD H 
D2A0U0 182 INC SKIP6G 
o9 183 DAD 3 
29 184 SKIP6G: DAD H 
92500 185 INC SKIP6H 
og 186 DAD 8 
187 SKIP6H: 3 => END OF UNROLLED LOOP =~ 
“6 188 MOV 3-H # MOVE PRODUCT TO BC 
40 189 MOV Col 
1 190 POP H # RESTORE REGISTERS 
ra) 191 ReT 
192 
193 END 


Listing 6: Perhaps the ultimate 8-by-8 multiply short of a memory intensive full table lookup of answers is this fully 
unrolled version which expands the memory requirements to 49 bytes, but cuts the time requirements to 258 cycles for 
nearly 50% savings relative to the time requirement of the original program. 


180 


program in which the subroutine ap- 
pears, but there is a significant saving in 
time and program size. The final version 
of the subroutine appears in listing 4. It 
is 20 bytes long, compared with 27 bytes 
for the original routine, and takes about 
393 cycles, compared with about 525 
cycles for the original routine. A million 
multiplications with a typical 8080 pro- 
cessor’s clock would take about four 
minutes, 20 seconds with the old version 
and about three minutes, 15 seconds 
with the new. 

If this is not fast enough, we can 
speed up the routine still further, by 
unrolling the loop, replicating its instruc- 


tions as shown in listings 5 and 6. This 
eliminates some or all of the time taken 
by the DCR A and JNZ LOOP instruc- 
tions which control is only executed four 
times instead of eight times, and in 
listing 6 there is no loop control at all. 
These speedup techniques cost memory, 
however, and tend to make the code 
more confusing. It is common to have to 
trade memory for speed, and which is 
more important depends on the par- 
ticular program. With the long program, 
our million multiplications would take 
only about 2 minutes, 13 seconds, just 
over half as long as the original 
program. 


A program or subroutine can usually 
be modified so that it requires less time 
or space for execution. This observation 
about optimization suggests that a pro- 

ram or subroutine can usually be 
changed, so that it either runs faster or 
_ takes up less memory space, and one 
_ ¢an often accomplish both at the same 
ime. 
Programs can be optimized for other 
things, such as readability, main- 
tainability, structure, etc. This article, 
\ er, stresses optimization for time 
and space. If a program written for a 
microprocessor can be made shorter us- 
ing space optimization, less memory can 
used, or more functions can be 
Packed into the same memory. Either 
Way, optimization pays off. If the pro- 
_ gram can be made to run faster, more 
functions can be performed in the same 
_ amount of time. In fact, optimization 
-€an make the difference between 
whether or not an application of a 
Microprocessor is feasible. 
_. A distinction can be made between 
two types of optimization techniques. 
Jne is code optimization and the other 
algorithmic optimization. Code op- 
mization involves concentrating on the 
fucture of the actual code on a low 
el. This includes such operations as 
Ording instruction sequences and 
bining two instructions into one in- 
truction. Algorithmic optimization is on 


Low-Level Program 
Optimization: 


Some Illustrative Cases 


James Lewis 


a high level and involves rethinking the 
whole approach to a program or section 
of a program. This is much more general 
and powerful than code optimization, 
but its rules cannot easily be written 
down. It takes an experienced program- 
mer or system designer to perform 
algorithmic optimization effectively. Ex- 
amples of code-optimization tricks will 
be given below. 

In the event that a program cannot be 
modified so that both space and time 
are lessened, there is usually the 
possibility of a trade-off. That is, if space 
is decreased, time will increase, and if 
time is decreased, space will increase. 
Only the particular situation can deter- 
mine which route to take. 

How much optimization is possible? 
Experience has shown that upon careful 
analysis a first draft program can 
typically be reduced by as muchas 50% 
or more in terms of memory space. Time 
optimization is another story. Some pro- 
grams can be accelerated at the expense 
of using more memory. However, signifi- 
cant time reductions can usually be 
made at little expense of memory; in 
fact, there may even be a savings of 
memory. 

How much optimization should be 
done? In the process of optimizing a pro- 
gram, it becomes harder and harder to 
discover more program reductions. How 
far one should go depends on the rela- 


181 


182 


tion between the cost of the program- 
mer’s time and the savings due to op- 
timizations. 

The process of optimization has fringe 
benefits. In analyzing a program, the 
programmer gains a clearer picture of 
how it works and often finds bugs. It is 
clear that a good software engineer 
should spend some time optimizing 
code. 

Before discussing the techniques 
themselves, it should be pointed out 
that not all of the ideas mentioned are 
always beneficial. For example, one of 
the tricks reduces the elegance of the 
subroutine structure. If this type of 


Example of program 
before 
optimization 


CALL ARNOLD 
RETURN 


Description of optimization technique 


Returning a Call 


If a call to a subroutine is followed by a return instruction, the two instructions can 
be replaced by a jump to the subroutine. 


elegance is desired, perhaps the trick 
should not be used. 

The ideas presented are applicable to 
most microprocessors. They are intend- 
ed for use on assembly-language pro- 
grams, although some of them apply to 
other languages. An English assembly 
language is used in the examples for 
generality. Note that the command 
CALL SUB means push the return ad- 
dress on the stack and then jump to the 
subroutine. 

The code-optimization examples will 
usually be presented in the following 
format: 


The same program 
shown after 
optimization 


JUMP ARNOLD 


Endless Subroutine 


If the last line of a subroutine is a jump to another subroutine, as in the first exam- 
ple, one can often position the subroutine which is jumped to direcity below the jump 
instruction, so that the jump instruction is not needed. 


JUMP BETTY 
LOAD X 


RETURN 
STORE X 


RETURN. 


BETTY: STORE X 


RETURN 
CINDY: LOAD X 


RETURN 


To increase the speed of an important loop, one can expand the loop either partially 
or wholly at the expense of space. This works best when the loop has a fixed number 
of iterations that is relatively small. 


LOAD IMMEDIATE 10 LOAD IMMEDIATE 5 
CALL DANNY LOOP: CALL DANNY 
CALL EDDY CALL EDDY 
DECREMENT CALL DANNY 
JUMP IF NOT ZERO LOOP CALL EDDY 

DECREMENT 

JUMP IF NOT ZERO LOOP 


Passing Fixed Data 


If a block of data has to be passed toa subroutine, rather than setting up and pass- 
ing a pointer to the data, put the data directly following the call and rewrite the 
subroutine to look for the data at the return address. This may involve more code in 
the data processing subroutine, but can Pay off in many cases. The subroutine must 
compute a new return address that follows the data, and use this altered return ad- 
dress instead of the original. 


LOAD ADDRESS OF DATA 
CALL FARRAH 


CALL FARRAH 
DATA: BYTES 36,24,36 


DATA: BYTES 36,24,36 


Bg 0 eicerne  tn 


Short tables that have more than 1 byte per entry are easier to work with if the 
| fumber of bytes per entry is a power of 2. This may waste some space in the table, 
but may save more space and also time in the code which handles the table. Com- 
puting an offset into a table that is a power of 2 can be done with a series of shifts in- 
‘Stead of the integer multiplication that would otherwise be required. 


TABLE: BYTES 36,24,26 
BYTES 36,22,37 
BYTES 38,23,38 
BYTES 35,20,34 


ae SS 


TABLE: BYTES 36,24,26,00 
BYTES 36,22,37,00 
BYTES 38,23,38,00 
BYTES 35,20,34,00 


Me a 


Use the Stack 


i 


___ Instead of saving temporary values at some memory location, they can often be 
Saved on the stack. This usually holds true, even when manipulating data on top of 
| the stack. The details are too machine dependent to give an example, but some of the 
‘ er microprocessors recognized this by having more than one hardware- 
_ implemented stack pointer. 


183 


Combine Instructions 


It is sometimes easy to miss the possibility of combining instructions. One situation 
which can be missed is when one can combine a symbolic value with a constant at 
assembly time rather than at execution time. 


LOAD IMMEDIATE ADDRESS LOAD IMMEDIATE ADDRESS +1 
ADD IMMEDIATE 1 


Multiple Additions 


Normally, several ADD IMMEDIATE instructions in a row would be a bad idea. Ina 
frequent situation, however, it can be very useful. Suppose one wants to pass a 
number to a subroutine and have the subroutine return 1, 2, or 3, depending on 
whether the passed number was 5, 12, or 13 respectively. Note that the optimization 
shown is of space at the expense of some time. 


COMPARE IMMEDIATE WITH 5 COMPARE IMMEDIATE WITH 5 
JUMP IF EQUAL TO ONE JUMP IF EQUAL TO ONE 
COMPARE IMMEDIATE WITH 12 COMPARE IMMEDIATE WITH 12 
JUMP IF EQUAL TO TWO JUMP IF EQUAL TO TWO 
LOAD IMMEDIATE 3 ADD IMMEDIATE -6 
RETURN ONE: ADD IMMEDIATE 6 

ONE: LOAD IMMEDIATE 1 TWO: ADD IMMEDIATE-10 
RETURN RETURN 

TWO: LOAD IMMEDIATE 2 


RETURN 


[Editor’s note: The techniques presented here tend to produce nonstructured programs. 
The programmer must make a choice between readable structured code and speed op- 
timized code. Structured-programming techniques are recommended for all programs 
not requiring crucial space and time specifications ... RGAC]@ 


184 


_ How many times have you waited ina 
line? Do you always get to a super- 
‘market checkout counter without hav- 
g to wait? Is the pump at the gas sta- 
on always open and ready for you as 
ou drive into the service area? It’s dif- 
icult to imagine anyone going 
here and not having to wait in a 


ince we are computer oriented, we 
ould define a waiting line by its proper 
ime — that is, a queue. 
queue is a waiting line controlled by 
le service mechanism. A customer 
ers a queue at the tail of the queue, 
aits in line until he or she arrives at the 
id of the queue, is serviced at the 
d of the queue, and, finally, leaves 
queue. At the supermarket a 
tomer pushes a cart to one of the 
s formed at the checkout area and 
in a line until finally arriving at the 
h register at the head of that line. 
r checking out the purchases, that 
stomer leaves the queue. 


eue Examples 


her examples of queues can be 
ound in many areas of our everyday 
The supermarket checkout queue 
commercial type of queuing system. 
her commercial queues include the 


Queuing Theory, 
The Science of 
Wait Control 


Part 1: Queue Representation 


Len Gorney 


bank teller queue, the barbershop 
queue, the gas station queue, etc. The 
field of transportation is not without its 
share of queues: traffic lights, turnpike 
toll booths, airport runways, loading and 
unloading docks are but a few examples. 

Of course, we have personal queues. 
How about that shelf of books you’re 
planning to read some day? 


Let’s Have Order 


A queue is defined as a waiting line, 
and since a waiting line has both a begin- 
ning (ie: tail) and an end (ie: head), a 
queue must also have both these proper- 
ties. 

The head and tail idea implies that 
customers entering (ie: being inserted) or 
leaving (being deleted) must follow a 
definite ordering scheme as members of 
the queue. This ordering scheme is 
defined as the dispatching discipline of 
the queue. 

The usual dispatching discipline of a 
queue is known as first-in, first-out or 
FIFO. An orderly queue exhibits this 
scheme. The first person entering the 
queue is the first person to receive ser- 
vice, and the last person entering the 
queue is the last person to receive ser- 
vice. Any person entering after the first 
but before the last must spend some 


185 


Listing 1: Simple BASIC simulation of a row queue. Pseudo-random- 
number generation is done to ensure that the queue simulation works 
correctly as described in the text. A sample run of the program is also 


shown. 


186 


1000 
1001 
1002 
1003 
1010 
1020 
1030 
1031 
1032 
1033 
1040 
1041 
1042 
1043 
1050 
1051 
1052 
1053 
1054 
1055 
1060 
1070 
1080 
1090 
1091 
1092 
1093 
1094 
1100 
1110 
1120 
1130 
1140 
1141 
1142 
1143 
1150 
1160 
1161 
1162 
1163 
1164 
1165 
1166 
1170 
1180 
1190 
1200 
1210 
1220 
1230 
1231 
1232 
1233 
1234 
1235 
1236 
1237 
1240 
1250 
1260 
1270 
1280 
1290 
1300 
1310 
1320 
1330 
1340 
1350 


REM 
REM 
REM 


REM 
REM 
REM 


REM 
REM 
REM 


REM 
REM 
REM 
REM 
REM 


REM 
REM 
REM 
REM 


REM 
REM 
REM 


REM 
REM 
REM 
REM 
REM 
REM 


REM 
REM 
REM 
REM 
REM 
REM 
REM 


DIM Q(5) 
INITIALIZE QUEUE TO EMPTY STATE 


FOR J2 = 1TOS 
Qd2) = -9 
NEXT J2 


INITIALIZE TAIL TO HEAD OF QUEUE 
T=5 

START OF MAIN SIMULATION LOOP 
FOR J2 = 1TO15 


GENERATE A RANDOM NUMBER TO DETERMINE 
AN INSERTION WHEN N <= 5 
A DELETION WHEN N>= 6 


N = INT (RND(1)+10) +1 
PRINT “NUMBER ="; N; 
IF N <=5 GOSUB 1170 
IF N > = 6 GOSUB 1240 


PRINT QUEUE CONTENTS 
PRINT TAIL POINTER VALUE 


PRINT “QUEUE="; 
FOR J3 = 1TOS 
PRINT Q(J3); 

NEXT J3 

PRINT “TAIL="; T 


END OF MAIN SIMULATION LOOP 


NEXT J2 
STOP 


INSERTION ROUTINE 


WHEN T= 0 QUEUE IS FULL, I.E. OVERFLOW 
ELSE, INSERT N AT TAIL AND DECREMENT TAIL 


IF T = 0 GOTO 1220 
PRINT “ INSERTION”; 
Q(T) = N 

T=T-1 

RETURN 

PRINT “ OVERFLOW"; 
RETURN 


DELETION ROUTINE 


WHEN T= 5 QUEUE IS EMPTY,-I.E. UNDERFLOW 
ELSE, DELETE N AT HEAD OF QUEUE 
AND MOVE REMAINING ITEMS TOWARD HEAD 


IF T = 5 GOTO 1350 
PRINT “ DELETION"; 
T=T+1 

FOR J4 = 5TOT STEP -1 
IF J4 = 1 GOTO 1330 
i =J4-1 

QU4) = QU5) 

NEXT J4 

RETURN 

Qi) = -9 

RETURN 

PRINT “ UNDERFLOW”; 


time waiting in the queue before service 
may be rendered. 

The first-in, first-out discipline is but 
one of many ordering schemes that 
queues follow. Other servicing 
disciplines include last-in, first-out (eg: a 
stack of dishes), a priority queue and 
shortest line first or longest line first. 
These are multiple-queuing systems and 
will be discussed later. 


Queue Representation 


How can we represent a queue as part 
of a computer program? The following 
piece of BASIC coding, a one- 
dimensional array, could be used to 
represent a queue in a computer pro- 
gram: 


10 DIM A(100). 


A queue is nothing more than a 
special purpose one-dimensional array. 
Just as the ordinary one-dimensional ar- 
ray is represented as a single row or a 
single column structure n locations long 
or deep, the queue can be represented 
as a single row structure n locations 
long. 


Over and Under 


When an array is dimensional to 100 
locations, the program cannot access 
the 104th or —36th location. These in- 
teger values are not within the boun- 
daries of the dimensioning statement. If 
the program attempts to address out of 
range locations during execution of the 
program, an overflow or underflow con- 
dition occurs. Overflow occurs when a 
location greater than that given in the 
dimensioning statement is addressed. 
Likewise, underflow occurs when a 
negative subscript is given as an address- 
ing value. 

Some BASIC interpreters allow for ad- 
dressing location 0 of an array. If an ar- 
ray is dimensioned to 100 locations, the 
actual number of legally addressable 
locations is 101 (by counting location 0 
as the first available location). 

The program listings in this article do 
not take advantage of this extra 
available array location. The first 
available location is always array loca- 
tion 1, and the last available location is 
equal to the integer value given in the 
dimensioning statement. 

Let’s get back to overflow and under- 
flow as these conditions apply to 
queues. If we assume that our queuing 
program will not address a location 


above or below those given in the 
limensioning statement, overflow and 
inderflow take on a somewhat different 
leaning. 

-A queue overflow occurs when the 
rogram attempts to insert an item into 
ir queue and the queue is filled to its 
apacity. Underflow in a queue struc- 
ire occurs when the program attempts 
» delete an item from the queue, but 
e are no items in the queue. 


e Operations 


s in an ordinary one-dimensional 
fray can have many operations per- 
ed on them. A program can insert 
s anywhere within the array, and 
can be removed from any legal 
ition within the array. Items can be 
ined and left in place or moved to 
location within an array. 

queue can have only two operations 
ormed upon its items. The first of 
se allowable operations is the inser- 
of an item into the queue. This in- 
in can be done only at the tail of 
queue. The second operation allows 
rdeletion. Deletion is done only at the 
bad of the queue. 


je Simple Row Queue 


@ program shown in listing 1 is a 
lation of a row queue. (See figure 1.) 
mechanics of a row queue follow 
8 definitions we have seen so far. 
The row queue has its tail at location 
array Q, while its head is at location 
array Q. The choice of these loca- 
for tail and head is arbitrary. | 
this scheme because it is easier to 
tt the queue during execution of 
Program in a normal left-to-right 
g fashion. 
he head (ie: service facility area) of 
} queue of listing 1 is always at loca- 
n Q(5). The tail of the queue (ie: the 
sation in the queue where items will 
inserted) moves from location 5 
yard location 0 or array Q as items are 
' into the queue. When items are 
ed, the tail of the queue moves 
present value toward location 5. 
tail of the row queue is indicated 
tail pointer — variable T. When T is 
© queue is empty, that is, there are 
ems in the queue. When T is 0 the 
s filled to its capacity and no in- 
ns can be made without causing an 
condition. 
‘simulate the action of a queue 
ly, listing 1 generates pseudo- 
m numbers to determine queue in- 


Listing 1 continued: 


1360 RETURN 

1370 END 

RUN 

NUMBER= 7 UNDERFLOW QUEUE=-9 -9 -9 -9 -9 TAIL=5 
NUMBER= 3 INSERTION QUEUE=-9 -9 -9 -9 3 TAIL=4 
NUMBER= 7 DELETION QUEUE=-9 -9 -9 -9 -9 TAIL=5 
NUMBER= 4 INSERTION QUEUE=-9 -9 -9 -9 4 TAIL=4 
NUMBER= 1 INSERTION QUEUE=-9 -9 -9 1 4 TAIL= 3 
NUMBER= 3 INSERTION QUEUE=-9 -9 3 1 4 TALL=2 
NUMBER= 2 INSERTION QUEUE=-9 2 3 1 #4 TAL=1 
NUMBER= 5 INSERTION QUEUE= 5 2 3 1 #4 TAIL=0 
NUMBER= 2 OVERFLOW QUEUE= 5 2 3 1 #4 TAIL=0 
NUMBER= 8 DELETION QUEUE=-9 5 2 3 1 TAL=1 
NUMBER= 7 DELETION QUEUE=-9 -9 S 2 3 TAL=2 
NUMBER= 8 DELETION QUEUE=-9 -9 -9 5 2 TAL=3 
NUMBER= 3 INSERTION QUEUE=-9 -9 3 5S 2 TAIL= 2 
NUMBER= 4 INSERTION QUEUE=-9 4 3 5 2 TAIL=1 
NUMBER= 9 DELETION QUEUE= -9 -9 4 3 5 TAIL= 2 


Figure 1: Simple row queue. This type of queue has a stationary head 
and a moving tail. As data items are deleted from the head, all of the 
data items in the queue are moved toward the head, and the tail pointer 
is decremented by 1. As more data is entered into the queue as the tail, 
the location of the tail pointer is incremented by one location. 


Figure 2: Circular queue in three states of use. Figure 2a is an empty 
queue, in which the head pointer and the tail pointer point to the same 
location in the queue. Figure 2b shows a partially filled circular queue. 
The tail pointer moves ahead of the head pointer as data items are add- 
ed to the queue. As an item is deleted, the head pointer moves towards 
the tail pointer. Figure 2c shows a full queue. In this state the tail 
pointer has caught up with the head pointer. Note that one location in 
the queue will be left empty. If this were not done, the next item added 
to the queue would make the head and tail pointers point to the same 
location, which would seem to indicate that the queue was empty. 


187 


Listing 2: BASIC listing for a circular simulation. Lines 1900 through 
2100 are the insertion routine; lines 2110 through 2270 are the deletion 
routine. A sample run of the program is shown at the end of the listing. 


188 


DIM Q(5) 
INITIALIZE QUEUE TO EMPTY STATE 


FOR J2 = 1TOS 
QU2) = -9 
NEXT J2 


INITIALIZE HEAD AND TAIL POINTERS 
TO HEAD OF QUEUE LOCATION 


H=5 
T=5 


START OF MAIN SIMULATION LOOP 
FOR J3 = 1TO 10 


GENERATE A RANDOM NUMBER TO DETERMINE 
AN INSERTION WHEN N <= 5 
A DELETION WHEN N >= 6 


N = INT( RND(1)+10) +1 
IF N <= 5 GOSUB 1900 
IF N >= 6 GOSUB 2110 


PRINT QUEUE CONTENTS 
PRINT TAIL AND HEAD POINTER VALUES 


FOR J4 = 1TOS 

PRINT Q(J4); 

NEXT J4 

PRINT “ TAIL AT"; T; “ HEAD AT"; H 


END OF MAIN SIMULATION LOOP 


NEXT J3 
STOP 


INSERTION ROUTINE 
CHECK TAIL AND HEAD POINTER VALUES 
IF H- = T GOTO 1970 
IF H < GOTO 2030 
IF T>= 1 GOTO 2030 
IF H = § GOTO 2080 


INSERT ITEM AT Q(H) 
SINCE QUEUE IS EMPTY 


Q(S) = N 

T=4 

GOTO 2050 

IF T <> 0 GOTO 2000 

RESET POINTERS TO HEAD OF QUEUE 


H=5 
T=5 


CHECK IF Q(T) EMPTY FOR POSSIBLE INSERT 
IF Q(T) <> -9 GOTO2080 

H=5 

T=5 

NORMAL TAIL INSERTION 


Q(T) = N 
T=T-1 


sertion or deletion. The importance of 
randomness in proper queue operation 
is explained later. 

Before you execute the program in 
listing 1, run through its operations with 
pencil and paper. This approach will 
show you how the program will run 
before the actual operation is simulated 
by the computer. This method will also 
clarify the mechanics of a simple row 
queue operation. 


The Circular Queue 


A major disadvantage of our simple 
row queue is the fact that items must be 
moved toward the head of the queue 
after each deletion. [Editor's Note: This 
is not true, however for all implementa- 
tions of a row queue. Often, the pointers 
indicating the head and tail of the row 
queue are moved instead of all the data 
inside the queue ... RGAC] The loop in 
line numbers 1370 thru 1400 of listing 1 
accomplishes this move. If we’re trying 
to represent a queue simulation in a 
computer program, why not use some 
programming techniques to take advan- 
tage of decreasing execution time and 
thereby eliminate some of the unwieldy 
code? 

The circular queue, figure 2, is also 
represented as a special-purpose, one 
dimensional array. The simple row 
queue has a pointer to keep track of the 
location where the next item insertion 
was to take place. The circular queue 
also has this tail pointer. 

The difference between the row and 
circular queue lies in the addition of 
another pointer to indicate the location 
of the head of the queue. The simple 
row queue always has its head at the last 
available location of the array Q. The 
circular queue structure can have its 
head anywhere within the queue. 


Circular Queue Representation 


The circular queue operates in the 
same manner as the simple row queue. 
Items are still inserted into the location 
given as the tail point location of array 


The major difference is in the way 
which the program controls the head 
location of the queue. A new variable 
for head pointer called H points to the 
array location which holds the item 
ready for deletion. 

An item is inserted into the queue at 
the location pointed to by the tail 
pointer. After this insertion, the pointer 
is moved by one location in readiness 


deleted, the head pointer comes into 
“play. In the simple row queue, the head 
is always at the last available location. 
‘In the circular queue, the head of the 
ueue is defined by the value of the 
ead pointer variable H. After an item is 
leleted, the head pointer is moved one 
location toward the value of the tail 
pointer. In this structure, data items re- 
ain stationary; only the pointers vary, 
dicating relative positions of the tail 
‘and the head of the queue. 
_ This queue structure is clearly advan- 
tageous when we're dealing with long 
queues. If a row queue is filled to its 
capacity and an item is deleted, every 
maining item has to be moved one at a 
ime toward the stationary head of the 
r queue. The circular queue moves 
e@ head pointer by only one location, 
ereby cutting program execution time. 
The tradeoff is time versus space. The 
ircular queue program is longer than 
e simple row queue; however, the time 
execute the circular queue routine is 
shorter since the majority of code execu- 
nin the simple row queue is during 
e moving of the items after a delete 
eration. 
In the circular queue, the tail pointer 
chases the head pointer during inser- 
ns. During deletions, the head pointer 
hases the tail pointer. 
When the circular queue is filled to 
Capacity, the head and tail pointers are 
it adjacent locations. No more items 
May be inserted simply because there is 
more available space to fit an item 
into the queue. An overflow condition 
curs if an insertion is attempted on a 
led queue. 
An underflow occurs when the queue 
is empty and a deletion is attempted. An 
pty circular queue is one in which the 
| and the head pointers are at the 
ime location in the array Q. 
The program given in listing 2 
imulates a circular queue. Again, a pen- 
and paper method of initial execu- 
nm may prove helpful. After the 
Mechanics of this structure are 
understood, then execute the program. 
_ This completes our discussion of two 
erent types of queues and their 
esentation in a computer. In part 2 
will consider queues in the world 
und us and fit them into the struc- 
es already developed. m™ 


| 


Listing 2 continued: 


2050 
2060 
2070 
2080 
2090 
2100 
2101 
2101 
2103 
2104 
2105 
2110 
2101 

2130 
2140 
2150 
2160 
2170 
2171 

2172 
2173 
2174 
2180 
2190 
2200 
2201 

2202 
2203 

2210 
2220 
2230 

2240 

2250 

2260 

2270 


RUN 


REM 
REM 
REM 
REM 
REM 


REM 
REM 
REM 
REM 


REM 
REM 
REM 


END 


ARRIVAL 


-9 


-9 


ARRIVAL 


-9 


-9 


ARRIVAL 


-9 
-9 


-9 
-9 


ARRIVAL 


-9 


5 


ARRIVAL 


3 


5 


ARRIVAL 


3 


OVERFLOW 


3 


5 


5 


ARRIVAL 


3 


5 


PRINT" ” 
PRINT “ARRIVAL” 
RETURN 
PRINT" ” 
PRINT “OVERFLOW” 
RETURN 
DELETION ROUTINE 
CHECK POINTER VALUES FOR POSSIBLE DELETE 
IF H = T GOTO 2150 
IF H > 0 GOTO 2190 
H=5 
GOTO 2180 
IF H <>0 GOTO 2180 
H=5 
T=5 
DELETE FROM Q(H) HAS AN ITEM 
ELSE, QUEUE IS EMPTY, I.E. UNDERFLOW 
IF Q(H) = -9 GOTO 2240 
Q(H) = -9 
H=H-1 
RESET POINTERS FOR NEXT DELETE 
IF H <> 0 GOTO 2260 
H=5 
RETURN 
PRINT “ ” 
PRINT “UNDERFLOW” 
RETURN 
-9 -9 3 TAIL AT4 HEAD ATS 
-9 2 3 TAIL AT3 HEAD ATS 
4 2 3 TAIL AT2HEAD ATS 
4 2 -9 TAIL AT2 HEAD AT4 
4 2 -9 TAIL AT 1 HEAD AT4 
4 2 -9 TAIL ATO HEAD AT4 
4 2 1 TAILAT4HEAD AT4 
4 -9 1 TAIL AT4HEAD AT3 
4 3 1 TAIL AT3HEAD AT3 


189 


In part 1 we discussed the computer 
lementation of row and circular 
eues. Now, let us take a look at the 
frucutre of queues in the real world and 
if they can be fitted to our previous 
fograms. In the following discussion, 
e word queue refers to the waiting line 
the system. The word facility refers to 
le service facility area located at the 
aad of the queue. 


Types 


ere are four general types of queu- 
tructures. The first, and simplest, is 
2 single-queue single-facility system 
in figure 3. In this structure, there 
e waiting line and one service area 
e studied. A one-pump gas station 
One entrance is a real world exam- 
‘of this system. 

Ne can extend this system to the 
le queue multifacility system shown 
gure 4. In this structure, customers 
up in a single waiting line and are 
ed at the first of a series of 
ilities. Upon departure from the first 
, the customers immediately 
another queue to await their turn 
second service facility. This inser- 
1 and deletion continues until the 
lomer is eventually deleted from the 


The numbering of the figures and listings is 
inued from part 1. 


Queuing Theory, 
The Science of 
Wait Control 


Part 2: System Types 


Len Gorney 


last facility and consequently the entire 
system. This structure is not unlike a 
cafeteria where you first line up for a 
sandwich, then line up for dessert, then 
for a drink, and finally, for the cash 
register. 

Another basic queue structure is a 
multiqueue single facility system shown 
in figure 5. This is the type of structure 
you see at a typical supermarket 
checkout counter area. Customers arrive 
at the queue with their purchases and 
choose one of many waiting lines. Each 
service facility offers the same service, 
that is, checking out the purchases, but 
each line holds different customers. 

The multiqueue multifacility system 
in figure 6 is a combination of the 
previously mentioned structures. A 
number of initial queues feed into a 
series of facilities. When a customer 
enters a particular queue, that customer 
travels from each facility within that 
subsystem until the eventual deletion 
from the system. Once a customer is 
entered into a subsystem, that customer 
causes that subsystem to behave as does 
the single queue multifacility queue 
system. 

Any waiting line can be fitted to one 
of the four queue structures just men- 
tioned. Try it the next time you're 
waiting in a line. 

After we are able to define the type of 
queue we have, the problem of analyz- 
ing the structure and arriving at answers 


191 


se service L-_—_, — 


CUSTOMERS IN & QUEUE FACILITY out ———> 


rr 


Figure 3: A single-queue single-facility system with one waiting line and 
one service area. 


_Iseavice Pit 


FACILITY OUT——> 
a 


SERVICE 


SERVICE 
CUSTOMERS IN © QUEUE 1 odie QUEUE 2 FACILITY eee 
2 


Figure 4: Single-queue multifacility system, in which the customer waits 
in a queue to use a facility, then waits in another queue for the second 
facility, and so on until all service facilities have been used. 


IN SERVICE 
QUEUE 1 FACILITY ouT_—_—_—> 
IN SERVICE 
CUSTOMERS IN QUEUE 2 FACILITY our_———> 


Figure 5: Multiqueue single-facility system. An example of such a 
system is the supermarket checkout area. The checkout area has several 
service facilities, each with a corresponding queue, that all offer the 
same service. 


SUBSYSTEM 1 


SERVICE SERVICE 
QUEUE 1 CAC IETS, QUEVE 2 EACILITY QUEUE 3 * 


SUBSYSTEM 2 


SERVICE SERVICE 
CUSTOMERS IN QUEUE 1 FACILITY QUEVE 2 PASIEITY QUEUE 3 cL 
SUBSYSTEM m 
SERVICE SERVICE 
QUEUE 1 seciesey, QUEUE 2 SAGIEETY. QUEUE 3 e 


most important in queuing problems is 
our next step. At this time we will not 
concern ourselves with the difference 
between a single server or a multiserver 
queue. The former represents a grocery 
store checkout counter arrangement 
where customers enter any line — usual- 
ly the shortest or the fastest moving. The 
latter fits into the situation at a barber- 
shop. One long line feeds into a large 
service area where a number of barbers 
(ie: the servers) wait for you to come to 
them. 

Let’s imagine a one-pump gas station. 
At the start of the day, the operator (ie: 
server) opens the pump and waits for the 
first customer of the day to arrive. After 
some period of time, the first customer 
arrives and immediately drives up to the 
pump for service. This lucky first 
customer has no waiting time since the 
facility at the head of the queue is open 
and free of previous customers. The 
customer requires some period of time 
for service, and upon completion of this 
serving time leaves the system. The 
operator sits back and waits for the next 
customer to arrive. 

The second customer arrives, is im- 
mediately served, and leaves the system. 
If the only time a customer spends in a 
queue is the time required for service, 
no queue forms. What we need for a 
queue to form is to have customers ar- 
tive while there is a customer being ser- 
viced. Then a line will form with waiting 
customers. The queue will form based 
entirely upon the service requirements 
of the customer at the service area. 


Randomness 


A pure queuing problem requires that 
customer arrival and service times be 
different. In other words, while a 
customer is being serviced, other 
customers enter the system at random 


ACILITY CUSTOMER OUT ——> 


SERVICE 
QUEVE n F 
n 


SERVICE 
QUEUEn FACILITY CUSTOMER OUT———> 
n 


SERVICE 
QUEUE n FACILITY CUSTOMER OUT ——> 
a 


Figure 6: Multiqueue, multifacility system. This system has a number of initial queues feeding into a series of facilities. 
A customer entering a particular queue stays within that particular subsystem until leaving the system. 


192 


intervals during the simulation period to 
form a queue. 
_ Formally speaking, the randomness of 
these arrivals follows a Poisson distribu- 
tion and exponential interarrival times. 
Basically, this means that an arrival has 
an equal chance of arriving at the tail of 
the queue at any time during the simula- 
tion period of the problem. Typical non- 
"queue structures do not exhibit this ran- 
dom criterion. For example, a movie 
_ theater line is not a good queue problem 


-show starts. Therefore, during the 
imulation period, randomness is a key 
4 Randomness causes the 
iueue to lengthen and to shorten bases 
only on the service requirements of each 
ustomer. 
Usually a customer must wait ina line 
at any business establishment before 
“receiving the desired service. How the 
usinessman treats these waiting 
ustomers is of prime importance as to 
‘the success or failure of most 
sinesses. A typical customer will take 
one of the following actions when faced 
with a waiting line. The first action is to 
ist wait in the line until service arrives. 
nce in line, that customer will remain 
line until the end. The businessman 
as little worry over this customer 
because this customer will eventually be 
erviced and some profit will be real- 
ed. 
A second alternative open to a 
aiting customer is for that customer to 
key from line to line. How many 
imes have you seen this customer arrive 
at one queue, wait for a short period of 
time, move to another queue, wait 
again, then move again, and so on. This 
situation exists in the multiqueue system 
is evidenced in a bank or large super- 
ket with many service facilities 
ailable for customer use. 
The previous two actions should 
ause little concern. The customer re- 
ins in the system and will eventually 
@ served, thereby yielding the business 
me profit. However, what happens 
en the customer leaves the system 
fer entering or refuses to enter the 
item initially? 
‘If a customer has entered the system 
id leaves before being serviced, that 
Istomer has reneged. This situation oc- 
Ts quite often when the waiting lines 
fe moving at a rate far too slow for the 
istomers within the lines. The 
stomer and possible profits are lost to 
@ businessman when a customer’s ac- 
takes him or her on this route. 


Listing 3: BASIC program that simulates a single-queue single-facility 
system such as a one-pump gas station. The program incorporates 
several functions discussed in part 1. 


DIM Q(10) 

PRINT “MINUTES TO RUN SIMULATION ="; 
INPUT M 

PRINT “MAXIMUM ARRIVALS/UNIT TIME="; 
INPUT A2 

PRINT “MINIMUM SERVICE TIME="; 

INPUT S2 

PRINT “MAXIMUM SERVICE TIME="; 

INPUT S3 

PRINT “QUEUE LENGHT ="; 

INPUT H2 

PRINT “INPUT 1 FOR RUNNING OUTPUT, ELSE INPUT 0”; 
INPUT P 


= eEE99Q0 
"oonun se 
So00c0co 


55 


FOR J2 = 1 TO H2 


GOSUB 1610 

FORJ = 1TOM 

FOR J2 = 1 TO H2 

IF QU2) = -9 THEN 1330 
C=C+1 

QU2) = QU2) + 1 

NEXT J2 

C2=C2+C 

IF C <= C3 THEN 1370 
C3 =C 

c=0 

IF P = 0 THEN 1410 

PRINT “PICTURE OF QUEUE AFTER”; J;“MINUTES” 
GOSUB 1680 

IF Q(H) < M3 THEN 1520 
M2 = M2 + M3 
C4=C44+1 
S4=S4+5S 

IF P = 0 THEN 1470 
GOSUB 1730 

GOSUB 2110 

GOSUB 1610 

IF P = 0 THEN 1520 

PRINT “PICTURE OF QUEUE AFTER DELETE” 
GOSUB 1680 

A3=1 

A = INT (RND(1) * A2) 

IF A3 > A THEN 1580 
GOSUB 1900 

A3 = A3+1 

GOTO 1540 

NEXT J 

GOSUB 1730 

STOP 

S = INT(RND (1) * (S3 - 9) 
IF Q(H) = -9 THEN 1640 
Q(H) = 0 

M3 = Q(H) + S 

IF P = 0 THEN 1670 

PRINT “REQUIRED SERVICE TIME=";S 
RETURN 

FOR J2 = 1 TOH2 

PRINT Q(J2); 

NEXT J2 

PRINT “TAIL=";T;* HEAD=";H 
RETURN 


193 


194 


1730 
1740 
1750 
1760 
1770 
1780 


1810 


1860 
1870 


1890 


1910 
1920 


193] 


PRINT C4; “ FULLY SERVED CUSTOMERS IN”;J;“MINUTES” 
PRINT “MAXIMUM CUSTOMERS QUEUED =";C3 
MS = M2/C4 
PRINT “AVERAGE WAIT TIME=";MS 
S5 = S4/C4 
PRINT “AVERAGE SERVICE TIME=";S5 
CS = C2/J 
PRINT “AVERAGE NUMBER OF QUEUED CUSTOMERS ="; CS 
RETURN 
REM 
REM INSERTION ROUTINE 
REM 
REM CHECK TAIL AND HEAD POINTER VALUES 
REM 
IF H = T GOTO 1970 
IF H < T GOTO 2030 
IF T >= 1 GOTO 2030 
IF H = H2 GOTO 2080 


REM _ INSERT ITEM AT Q(H) 
REM SINCE QUEUE IS EMPTY 


Q(H2) =0 

T=H2-1 

GOTO 2050 

IF T <> 0 GOTO 2000 


REM RESET POINTERS TO HEAD OF QUEUE 


H=H2 
T=H2 


REM CHECK IF Q(T) EMPTY FOR POSSIBLE INSERT 


IF Q(T) <> -9 GOTO 2080 
H=H2 
T=H2 


REM NORMAL TAIL INSERTION 


Q(T) =0 

T=T-1 

IF P = THEN 2070 
PRINT “ARRIVAL” 
RETURN 

IF P=0 THEN 2100 
PRINT “OVERFLOW” 


RETURN 
REM 
REM DELETION ROUTINE 
REM 


REM CHECK POINTER VALUES FOR POSSIBLE DELETE 


IF H=T GOTO 2150 
IF H>0 GOTO 2190 
H=H2 

GOTO 2180 

IF H< >0 GOTO 2180 
H=H2 

T=H2 


REM DELETE FROM Q(H) IF Q(H) HAS AN ITEM 
REM ELSE, QUEUE IS EMPTY, I.E. UNDERFLOW 


IF Q(H)= -9 GOTO 2240 
Q(H)= -9 
H=H-1 


REM _ RESET POINTERS FOR NEXT DELETE 


IF H <>0 GOTO 2260 
H=H2 

RETURN 

IF P=0 THEN 2260 
PRINT “UNDERFLOW” 
RETURN 


The last, and most damanging to the 
businessman, is the situation where a 
customer does not initially enter the 
system. When a customer sees a long 
and slow moving line, that customer 
usually balks. This customer is surely 
lost because he does not even give the 
businessman a chance at the very 
outset. 

Since time is money, the important 
questions relating to queuing systems 
must be solved with relation to the time 
involved in waiting and servicing 
customers. 

What is the maximum amount of time 
a customer waits in a line? What is the 
average amount of time all the 
customers are expected to wait in line 
before being served and deleted? What 
is the maximum amount of service time 
for any one customer during a typical 
period of time? Any measurement in- 
volving customer waiting time and 
customer service time is vital to the suc- 
cess or failure of a business. 


A Queuing Problem 


The program shown in listing 3 is that 
of a typical queuing problem utilizing 
the circular queue as the queuing struc- 
ture. What we may have here is a 
hypothetical one-pump gas station. The 
system will therefore be described as a 
single-queue single-facility structure. 

Past experience gives us some of the 
input parameters required for the prob- 
lem solution. For example, our queue is 
dimensioned to ten locations, so only 
ten cars can fit in our service area. This 
parameter can be adjusted using input 
parameter questions at the beginning of 
the program. In addition to the queue 
length, the program asks for the 
minimum and maximum typical service 
times. The arrivals per unit time deter- 
mine how many customers are arriving 
each minute during the simulation. The 
simulation is halted after the first 
parameter value is reached, namely, the 
amount of time to run the model. 


Conclusion 


For the serious reader, the list of 
reference material includes those texts 
that place a good emphasis on queuing 
theory. After digesting the ideas in this 
article, plunge into these texts. Now | 
can return to my reading queue and get 
to those lines of books and articles wait- 
ing on my bookshelf. I’m sure that 
somewhere a line is waiting for you! 


BIBLIOGRAPHY 


1. Cooper. /ntroduction to Queueing Theory. 
Macmillan, New York, 1972. 


. Cox, Smith. Queues. John Wiley and Sons, 
New York, 1961. 


. Gross, Harris. Fundamentals of Queuing 


Theory. John Wiley and Sons, New York, 
1974, 


4. Harrison. Data Structures and Programming. 
_ Scott, Foresman, Glenview IL, 1973. 


j. Hilier, Lieberman. Operations Research. 
_ Holden-Day, San Francisco, 1974. 


3. Siemens, Marting. Greenwood, Operations 
_ Research. Macmillan, New York, 1973. 
Wagner. Principles of Operations Research. 
_ Prentice-Hall, Englewood Cliffs NJ, 1975. 


195 


Seth 
bee i Mit wi Bi? 


nie 


A tit Cary st 
Ms ine 


nN pute ae re a 


oernaty 
hi ik : 
ie i at A paisa 
pew 4 WAN) aya 
qavhnren i sta 
sl re ser 


bie 


Avan irl vil 
mh ty 


th upd ive ere sy 
"ha ctv ‘tie " ms Ahonen Ss ia 


tm Ml diaetteatee wv inte 


Bh i\ UT aU Cad 


risa Guin 


VED) mt vi fd Ny 


,. W D Maurer 


_ made about a programming language 
when it is being strictly defined, as in a 
‘ nin manual. As such, BNF 


which are made about fe canter and 
mathematical quantities. Thus the state- 


equal to four-thirds the cube of the 
tadius times the ratio of the cir- 
_cumference of a circle to its diameter 
_ may be abbreviated as: 


_In order to make abbreviations such as 
the one above, we set up various con- 
_ ventions. For example: 


1. The quantities in the statement are 
tepresented by single letters; thus, V 
stands for the volume. 

2. Squares, cubes, and other powers 
_ are represented by superscript notation; 
thus, r? is the cube of r. 

3. Certain fixed quantities which ap- 
pear very often have standard names; 
thus, the ratio of the circumference of a 
r Bc’ to its diameter is always denoted 
by 7 

_ 4. Two single letters written together 
signify “times”; thus, ar means x times r. 
(This rule must be amplified in order to 
‘Specify clearly that mr? means x times 
the cube of r, and not the cube of ar; 


An Introduction to BNF 


and to make clear that it also applies to 
numbers, so that 44 means 4 times 7.) 

We shall now set up a number of 
similar conventions in order to ab- 
breviate statements made about pro- 
gramming languages. For example, con- 
sider the following sentence: 


A GO TO Statement in FORTRAN con- 
sists of the word GO TO followed by a 
statement number. 


We may abbreviate this in BNF as 
follows: 


< GOTO statement > :: = 
‘GO TO’ < statement number > 


In doing this we have implicitly set up 
the following conventions: 


1. The signs < and >, which also 
stand for “less than” and “greater than” 
but in this context are called angle 
brackets, enclose the name of some 
“quantity” which we wish to define in 
the programming language. We call 
such a “quantity” a syntactical variable. 

2. The special sign :: = means “is 
defined as.” This comes from ALGOL, in 
which the sign := is the replacement 
symbol, used in statements such as 
A: =B (ie: set A equal to B). 

3. The words “followed by” may be 
omitted, just as “times” may be omitted 
in algebra. 

There is a further analogy between 
BNF and algebra. When we write ‘GO 
TO’ <statement number>, we mean 


197 


198 


the words GO TO followed by any state- 
ment number. This is very much like 
writing 3x to mean 3 times the value of x, 
whatever it happens to be. Here, x is a 
variable, but 3 is a constant. Similarly, 
the phrase < statement number > is a 
syntactical variable, and may stand for 
any of various statement numbers; but 
‘GO TO’ always stands for the same 
thing, and may thus be called a syntac- 
tical constant. Syntactical constants are 
subject to another rule: 


4. A syntactical constant is enclosed 
in quotes. 

This last rule, incidentally, is not 
always followed. The single quote 
character ‘ is actually meaningful in 
some programming languages, and its 
use in programming-language definition 
would cause confusion here. Of course, 
we can always use the double quote “ in- 
stead of the single quote, unless the pro- 
gramming language uses both of these 
symbols (like SNOBOL, for instance). 
But sometimes even when there is no 
confusion, the quotes are omitted for 
the sake of brevity. 

It is clear, of course, that statements 
about programming languages may be 
abbreviated even further. We might 
write G — ‘GO TO’ S, thus incorporating 
the use of single letters for variables, as 
is done in algebra. In fact, this type of 
abbreviation is used extensively in the 
theory of context-free languages. (See 
references 2 and 3 for two interesting ap- 
plications of this theory and this type of 
abbreviation to programming 
languages.) The trouble with ab- 
breviating this far is that now the ab- 
breviation is not self-contained. We 
must still make some statement such as 
“where § stands for a statement 
number.” In contrast, the BNF rules 
which we define here permit the entire 
syntax, or “grammar rules” of a 
language, to be specified in a precise 
manner, using no other information than 
that contained in the BNF rules 
themselves. The semantics, or “mean- 
ing” of the language, must still be 
specified separately; and at this time 
there is no easy and fairly universal way 
to specify semantics, although attempts 
have been made. (See references 2 and 
4) 

Rules in BNF may be extremely sim- 
ple. We may write: 


< statement number > :: = 
< unsigned integer > 


to specify that the syntactical variable 


“statement number” takes the same 
form as the syntactical variable “un- 
signed integer.” This is often convenient 
when several syntactical variables have 
the same form. In most languages, for 
example, simple variable names, array 
names, and function (or subroutine or 
procedure) names all follow the same 
tules about starting with a letter, etc, 
and we define each of them to be the 
same as the syntactical variable < iden- 
tifier >. 

Sometimes, in a definition of this 
type, there will be more than one alter- 
native. For example, let us make a 
definition of “integer” not restricted to 
unsigned integers. If we already know 
what an unsigned integer is, we may use 
the following: 


An integer is an unsigned integer 
optionally preceded by a plus ora 
minus sign. 


The conventions which we have used 
thus far do not allow for the words op- 
tionally or preceded by, although fol- 
lowed by is permitted. Therefore, let us 
make an equivalent definition, which is 
slightly longer: 


An integer is either: (1) an unsigned 
integer; or (2) a plus sign followed 
by an unsigned integer; or (3) a 
minus sign followed by an unsigned 
integer. 


Now all we need is a symbol for “or.” 
The symbol we use is the vertical line |. 
Thus our abbreviated definition is: 


<integer> 3: = <u.i.>I'+’<u.i.> 
I'-’<ui.> 


where we have used “u.i.” for unsigned 
integer” in order to keep the definition 
from running off the end of the line. Ac- 
tually, this precaution is not necessary. 
Rules in BNF, just like statements in 
ALGOL, may run to several lines, and 
position on a given line is immaterial, 
although, in practice, a definition will be 
started at the beginning of a new line. 
Thus: 


<integer> :: = <unsigned integer > 
'+’<unsigned integer > 
I‘—’<unsigned integer > 


is a self-contained BNF rule equivalent 
to the one given above. 

The vertical line is often used for 
“lowest-level” definitions, in which a 
syntactical variable is being defined as 


one of a certain collection of 
‘acters. Thus: 


<digit> :: ='0"|/1’|/2"|'3’ 
sag PBA G? 
apie lege 


§ a very common definition. Note that 
his defines only a single digit, not an ar- 
‘ary integer; 63, for example, is not a 
it by this definition. We may, if we 
h, define “letter” in the same way, as 
iny one of the twenty-six letters of the 
Iphabet. We may even define “alpha- 
jumeric character” as any one of thirty- 
x different symbols, although what is 
sually done is to define “letter” and 
igit” first, and then to define: 


<alphanumeric character > :: = 
<letter> 
| <digit> 


"The definition of an integer, or of an 
ntifier, is slightly more complex. An 
signed two-digit integer may be de- 
ined very simply: 


: " <unsigned two-digit integer> :: = 
<digit > <digit> 


ilarly, an identifier containing exact- 
j two characters may be defined: 


<2-character identifier > :: = 

letter > <alphanumeric character > 
Extensions to three characters, four 
sharacters, etc, are easy enough to 
fisualize; and now, using the vertical 
ine and a few auxiliary abbreviations, 
@ may put together a definition of an 
dentifier which has six characters or 
ess. (See figure 1.) In a similar way, we 
ay construct a definition of an un- 
igned integer containing at most eleven 
ligits, or however many digits are per- 
nitted on a given computer. 

‘This kind of construction, however, 
ails when we do not wish to put any 
limit whatsoever on the number of digits 
in an unsigned integer or on the number 
characters in an identifier. In addi- 
ion, it is overly cumbersome even in the 
Orm given above. Therefore, we must 
| on some new resource. This has ac- 
lly been done, historically, in two dif- 
€rent ways. One way is designed for 
anguages such as ALGOL, LISP, and 
INOBOL, in which most constructions 
lo not have length limitations. The other 
jay is intended for FORTRAN and for 
implified versions of ALGOL, as well as 
Dr various other languages, in which 
ingth limitations do exist. We shall con- 


sider these in historical order. 

The response of the ALGOL group to 
this problem was to use the resources of 
mathematics, which rescue us (as they 
do so often) with what looks like magic. 
The trick is to use recursive definitions, 
which use the quantity being defined in 
the definition itself. Consider, for exam- 
ple, the following definition: 


<unsigned integer> :: = 
<digit > <unsigned integer> 
I<digit> 


Those without a background in 
mathematical logic may need a con- 
siderable amount of time to convince 
themselves that this definition of “un- 
signed integer” defines that syntactical 
variable, in a perfectly valid manner, to 
be a sequence of digits of any length 
whatsoever. The argument goes as 
follows: 


1. A digit is an unsigned integer by 
the above definition. 

2. A two-digit number is a digit 
followed by another digit, and the 
second digit, by the sentence above, is 
an unsigned integer. Therefore a two- 
digit number is an unsigned integer. 

3. A three-digit number is a digit 
followed by a two-digit number; a two- 
digit number is an unsigned integer by 
the previous sentence; therefore, a 
three-digit number is an unsigned in- 
teger. 

4. A four-digit number is a digit 
followed by a three-digit number, and so 
on; the argument may thus be extended 
indefinitely, with each sentence being 
used in the proof of the next. 


<an>:: 
<identifier> :: = <letter> 
|<letter><an> 


|<letter><an><an> 
I<letter><an><an><an> 
|<letter><an><an><an><an> 
|<letter><an><an><an><an><an> 


Figure 1: An identifier which has six characters or less. 


= <alphanumeric character> 


200 


Another common recursive definition 
is: 


<identifier > :: = <letter> 
|<identifier > <letter> 
|<identifier > <digit> 


This one is actually easier to understand 
if the last two alternatives are combined 
into a single alternative, < identifier 
>< an >, where < an > means 
“alphanumeric character’ and is de- 
fined as either a letter or a digit. Using 
this syntactical variable, we may rewrite 
the definition of an identifier as: 


<identifier> :: = <letter> 
| <identifier> <an> 


That this constitutes a valid definition 
may be seen as follows: 


1. A letter is an identifier by the 
above definition. 

2. An identifier with two characters 
consists of a letter, which is an identifier, 
followed by an alphanumeric character. 
Therefore, by the second part of the 
above definition, it is an identifier. 

3. An identifier with three characters 
consists of an identifier with two 
characters followed by an alphanumeric 
character. Therefore, by the definition 
above, it is an identifier. This argument 
may then be extended to identifiers with 
four, five, or any number of characters. 

Note that this definition of an iden- 
tifier involves various other identifiers 
whose names are contained within it. 
Thus the word TAU is an identifier partly 
because T is an identifier and so is TA. 
This fact may lead to confusion because 
we are constantly reminded, when 
studying programming languages, that 
the individual characters in an identifier 
have no separate meaning. Thus TAU is 
not (by definition) T times A times U, or 
TA times U, or T times AU. Nevertheless, 
TAU is a properly formed identifier 
because T and TA are — just as 2PI is 
not a properly formed identifier because 
2 and 2P are not. 

Still another common recursive defin- 
ition is: 


<argument list> :: = 
<expression >’,’< argument list > 
I<expression> 


This defines an argument list to be a 
series of expressions separated by com- 
mas. It may be used in various ways; for 
example, the BNF rule: 


<function reference> :: = 
<function name >‘(’<argument list >‘) 


is one way of defining a function 
reference (ie: a use of a function, such as 
SIN(T*U — B) or ATAN2(X2 — X1,Y2 — 1), 

The justification for the definition of 
an argument list is: 


1. A single expression is an argument 
list. 

2. Two expressions separated by one 
comma may be thought of as the first ex- 
pression followed by a comma followed 
by an argument list, namely the second 
expression. Therefore, this is an argu- 
ment list. 

3. Three expressions separated by 
commas may be thought of as the first 
expression followed by a comma fol- 
lowed by what remains — namely, the 
second and third expressions separated 
by a comma. This much is an argument 
list, by the sentence above. Therefore, 
three expressions separated by commas 
constitute an argument list. The same 
argument may be used for four, five, etc, 
expressions separated by commas. 


The recursive examples given above 
are written in what may be called pure 
BNF. The alternative is to add some new 
conventions to BNF which take care of 
the recursive cases. This brings us to the 
second possible response to the problem 
of representing sequences in BNF. For 
example, an unsigned integer which is a 
sequence of from one to thirteen digits 
might be written: 


<unsigned integer>:: = <digit> | 


and an unsigned integer which is a se- 
quence of an arbitrary number of digits 
(at least one) might be written: 


<unsigned integer> :: = <digit> 7 


Similarly, an identifier of arbitrary 
length is given by: 


<identifier> :: = 
<letter> i 
<alphanumeric character > 0 


Thus a subscript after any syntactical 
variable stands for a minimum number 
of repetitions, while a superscript after 
such a variable stands for a maximum 
number of repetitions. 


_ The use of subscripts and superscripts 
_ in this way solves two problems at once. 
_ It makes syntactical variables whose 
| lengths are strictly bounded much easier 
_ to represent in BNF. Also, by replacing 
many of the recursive uses of BNF with 
 nonrecursive uses, it frees the user from 
having to “think out” these recursive 
defintions. Nevertheless, the subscript 
- notation does not allow us to eliminate 
all recursion. This will become clear in 
_ the examples which we now consider, in 
_ which expressions are defined in BNF. 
There are many types of expressions 
in programming languages. In ALGOL 
_ we have conditional expressions, rela- 
| tional expressions, and Boolean expres- 
sions, in addition to the standard simple 
arithmetic expression. Some of these are 


| easy to define in terms of others. For ex- 


' ample, a relational expression (such as 
_ P>Q in an if statement) is defined by: 


| <relational expression> :: = 
4 <arithmetic expression> 
<relational operator > 

<arithmetic expression > 


_ where a relational operator may be any 
one of the six relations (eg: greater than, 
less than, etc) expressed in a manner 
which depends on the language used. In 
' any sort of expression containing 
"operators and possibly nested paren- 
theses, however, the definition will be 
much more complex. We shall indicate 
here how to define simple arithmetic ex- 
pressions in BNF; the basic technique 
used here may be used in other kinds of 
expressions. 

What is a simple arithmetic expres- 
sion? Clearly it is not simply any com- 
bination of identifiers, constants, 
_ operators, and parentheses. We may 
_ specify certain rules (eg: the parentheses 
_ have to balance, an operator cannot be 
_ the last character in the expression, etc) 
| but it is difficult to know when we have 
_ specified all of them. One key to defin- 
ing expressions is obtained by means of 
the precedence rules. You have probably 
seen these, although possibly not by this 
name; these are the rules that specify 
that multiplication is performed before 
addition, and the like. Thus in order to 
evaluate an expression (without paren- 
theses) there are three basic steps: 


1. Perform all the exponentiations. 

_ 2. Perform all the multiplications and 
divisions. 

_ 3. Perform all the additions and sub- 

_ tractions. 


We shall incorporate these steps into 
our definition of an expression by defin- 
ing four separate syntactical variables; 
primary, factor, term, and expression. 
Factors are made up of primaries; terms 
are made up of factors; and expressions 
are made up of terms. Thus, performing 
all the exponentiations in an expression 
corresponds to grouping the primaries 
into factors, and similarly for the other 
two steps mentioned above. 

In order to define a factor as a collec- 
tion of primaries separated by exponen- 
tiation signs, we note that this is similiar 
to defining an argument list as a set of 
expressions separated by commas. We 
need only substitute “exponentiation 
sign” for “comma,” and “primary” for 
“expression.” Thus the definition is: 


<factor> :: = <primary>‘t’<factor> 
| <primary> 


We could also rewrite this definition in 
the other mode: 


<factor> :: = <primary> 
('1’<primary>19 


In other words, a factor is a primary 
followed by any number of occurrences, 
including none, of an up-arrow (t) 
followed by another primary. This il- 
lustrates another feature of extended 
BNF: the use of square brackets, the 
signs [ and ]. Square brackets in BNF 
serve roughly the same function as 
parentheses do in algebra, and the use 
of square brackets in BNF is sometimes 
called factoring. 

We continue our definition of an ex- 
pression by defining a term as a collec- 
tion of factors separated by multiplica- 
tion and division signs. One way to do 
this is to define a “multiplication 
operator,” or “mulop,” as either * or /. 
The definition can then take the same 
form as before (here we illustrate only 
the recursive formulation): 


<term> :: = <factor> 
<mulop><term> 
| <factor> 


An expression would then be defined in 
a similar way, using “adop” for either + 
or-: 


<expression> :: = 
<term><adop> <expression> 
| <term> 


where we have used “expression” as 


201 


202 


short for “simple arithmetic expression.” 
Alternatively, both of these could be 
written out: 


<term> :: = <factor>‘*’<term> 
<factor>’ /’<term> 
<factor> 


<expression> :: = 
<term>‘+’<expression > 
| <term>‘—’<expression> 
| <term> 


The only question remaining is what we 
mean by “primary.” This differs from 
one language to another; roughly speak- 
ing, a primary is one of the “elementary” 
constructions which are connected by 
the operators, such as a constant, a 
variable, a subscripted variable, or a 
function reference. There is always, 
however, one special type of “primary” 
which takes care of parentheses. 

Up to now we have considered only 
expressions without parentheses. In one 
sense, when we introduce parentheses 
into an expression, we sometimes 
violate the precedence rules upon which 
we have built the entire preceding con- 
struction. Thus in the expression: 


A-—B*C(D—E) 


we do not perform all the multiplica- 
tions and divisions first. In fact, the sub- 
traction D—E must be performed before 
the division of C by the result. 

This expression, however, may also be 
thought of as: 


A-—B*C/F 


where F stands for (D — E). In this new ex- 
pression, we do perform all the 
multiplications and divisions before the 
additions and subtractions. The process 
of substituting F for (D —E) may suggest 
to us that an expression in parentheses 
can be treated as if it were a single 
primary. This is also true for more than 
one level of parentheses. Thus, the ex- 
pression: 


A+B*(C—D*(E+F/G)—H)+1 
may be thought of as the expression: 
A+B*J+1 
where J stands for the expression: 
C-—D*K-H 


in which K stands for the expression 


E+F/G. In each of these three expres- 
sions — E+F/G, C-—D*K—-H, and 
A+B*)+I — the precedence rules ap- 
ply; and the last two of these contain 
primaries (K and J) that take the form of 
an entire expression in parentheses. The 
BNF description of an expression is now 
completed by adding this form of a 
primary to the definition. Thus: 


<primary> :: = <constant> 
|variable> 
| <array reference> 
| <function reference > 
| <expression>‘Y 


is a sample definition of a primary in 
which the syntactical variables “con- 
stant,” “variable,” etc, should be further 
defined according to the rules of the 
particular language under considera- 
tion. 

One final and important fact about 
BNF is that although widely used, it is 
not universal enough to» describe the 
syntax of every well-known program- 
ming language. In particular Common 
Business Oriented Language (COBOL) 
contains certain constructions which are 
not amendable to BNF. The formal 
definition of the PL-I language as shown 
in reference 5, is made according to a 
separate set of abbreviation conven- 
tions which are similar, but not iden- 
tical, to BNF.@ 


REFERENCES 
1. Naur, P, ed. ‘“‘Report on the Algorithmic 
Language ALGO 60’ Communications of the 
ACM, volume 3, May 1960, pages 299 thru 
314. 


2. Knuth, D E. “Semantics of Context-Free 
Languages.’ Math Systems Theory, volume 
2, number 2; 1968, pages 127 thru 145. 


3. Knuth, D E. On the Translation of Languages 
from Left to Right. \Inf Contr volume 8, Oct 
1965, pages 607 thru 639. 


4. Wirth, N, and H. Weber. ‘Euler, A Generaliza- 
tion of ALGOL, and Its Formal Definition.” 
Communications of the ACM, volume 9 
January and February 1966, pages 13 thru 
25 and 89 thru 99. 


5. IBM Form C28-6571-4. /BM Systems/360 
Operating System, PL-I: Language Specifica- 
tions. 


_ Resident assembler programs and in- 
erpreters for high-level languages are 
available increasingly for microcom- 
puter systems based on the more 
popular microprocessors. Nevertheless, 
any operators of small microcomputer 
‘systems are unable to use such programs 
‘because their systems are not large 
nough to support them. Unless they are 
lucky enough to have access to a 
time-sharing service or to some larger 
“computer which supports a cross 
assembler, their only way of developing 
“a usable object program is to assemble 
‘it by hand. 

_ While the mere idea of such an 
endeavor might horrify any programmer 
who is used to working with large 
machines, the hand assembly of shorter 
Programs for 8-bit microprocessors ac- 
tually is not very difficult. It has been 
my experience that the assembly of pro- 
‘grams can be greatly simplified and the 
likelihood of errors can be reduced by 
using some simple aids in the assembly 
Process. 

One of these aids is in the form of 
hardware and consists of a special pro- 
‘gram assembly form. The software aids 
e several short utility routines which 
in even on the smallest microcomputer 
stems. Development of the assembly 
Method described in this article is based 
experience gained from working with 
Oogrammable calucators of the 
board language type. Matt Biever of 
e Pro-Log Corporation has long been 
advocating some of the techniques that 
am using. The article’s assembly 


Aids for Hand-Assembling 


Programs 


Erich A Pfeiffer PhD 


method is used for program develop- 
ment for a KIM-1 microcomputer. It can 
be adapted easily for other microcom- 
puter systems as long as they use an 
8-bit processor. The assembly method 
will be demonstrated with a sample pro- 
gram. 

Before writing a program, it is a good 
idea to put down in writing what the pro- 
gram is supposed to do. Such a program 
description, as shown in listing 1, might 
state any limitations on the magnitude 
of variables used or might indicate what 
happens if these limitations are exceed- 
ed. 

The next step is to develop a concept 
of the program in the form of a flow- 
chart as in figure 1. While the symbols 
used in such charts are standardized, 
the chart’s degree of detail is a mat- 
ter of personal preference. From pro- 
gram descriptions and flowcharts, one 
can determine how many memory loca- 
tions or registers will be necessary to 
store data and temporary results. These 


BRAVEC 


The program takes a 16-bit number ORigin and adds 2 to it. The new number 
then is subtracted from another 16 bit number, DEstination. The difference, 
which may be positive or negative, in two’s complement, is stored in POINTL. 


The difference is also examined to determine if it is larger than +127 (if 
positive) or smaller than — 127 (if negative). If this is the case, FF is loaded into 
POINTH; otherwise OO is loaded. POINTH and POINTL are then displayed by 
transferring control to the (KIM) operating system. 


Listing 1: Program description for BRAVEC. This description should be 
the first step taken when writing a program. 


203 


START 


subtract 
result from 
DEstination 


store 
difference 
in POINTL 


negative 
difference 
? 


negative 
difference 
? 


load FF 
into 
POINTH 


tronsfer to 
MONITOR 


load OO 
into 
POINTH 


transfer to 
MONITOR 


Figure 1: Flowchart of the program described in listing 1. The circled 
numbers refer to the comment numbers in listing 2. 


Use 
ORigin 


DEstination 


“open cell’’ 


Transfer to 
KIM monitor 


Label Location 


ORLO 0000 
ORHI 01 


DELO 02 
DEHI 03 


POINTL FA from listing of 
POINTH FB KIM monitor 


START 1C4F from listing of 
KIM monitor 


Table 1: Program register table for program BRAVEC. This table con- 
tains all descriptions of all memory locations used by the program. 


204 


locations should be written in the pro- 
gram register table as shown in table 1, 
This table also contains the addresses of 
subroutines or registers of the monitor- 
ing system that are called by the pro- 
gram, or of peripheral interface adapter 
(PIA) registers that will be addressed. 
The table is similar to the symbol table 
printed by the computer during the 
machine assembly of a program. 

After a program description is 
developed the actual writing of the pro- 
gram can begin. The programmer, who 
writes a symbolic listing for machine 
assembly, arranges a program in the 
form of lines. Each line is successively 
numbered, contains one mnemonic for 
an operation, unless it is an “all com- 
ment” line, and later will be punched in- 
to one punch card for computer entry. 
Because the operation described by the 
mnemonic can have a length of one, two 
or three bytes, each line eventually 
results in one, two, or three machine in- 
structions. Therefore, there exists no 
simple relation between the line number 
and the address at which the machine 
code is stored in the computer memory. 
For the hand assembly of programs, it is 
advantageous to use a different format 
for the program listing in which there is 
a one-to-one relationship between pro- 
gram line and memory location. The 
writing of the symbolic program and the 
assembly into. machine code is greatly 
simplified by the use of a special 
Pprogram-assembly form. The form | 
developed for our KIM-1 system is 
shown in listing 2. (Similar forms are 
available from the Pro-Log Corporation; 
order NrCF-1.) Each line of the coding 
form corresponds to one memory loca- 
tion with the least significant hex- 
adecimal digit of the address preprinted 
in the ADD column. The form can be 
used with any computer system that 
uses a hexadecimal machine code. For 
octal notation, a different layout is ad- 
vantageous. 

The programmer starts writing a pro- 
gram by adding the other digits of the 
program-starting address in the ADD 
and Page columns. It should be noted 
that the Page column refers to memory 
pages while the Page-of heading in- 
dicates pages of coding forms. The pro- 
gram is written by entering the 
mnemonic of the first instruction into 
the MNE column of line 0. Many of the 
instructions of a microprocessor can oc- 
cur in more than one addressing mode. 
During machine assembly, the 
assembler program deducts the address- 
ing mode from the format of the 


operand or the definition of a symbol. 
When hand-assembling a program it is 
advantagous to specify the addressing 
mode in the Mode column. Immediate 
mode addressing is commonly indicated 
by the symbol #. For other addressing 
modes, suitable abbreviations of the col- 
umn headings in the programmer's 
teference card should be used. For 
operations which have only one address- 
ing mode, the Mode column is left emp- 
ty. The addressing mode determines 
how many address bytes will have to 
follow the op code byte. After filling in 
the Mode column, the programmer 
should cross out the appropriate 
number of lines in the MNE column. This 
reserves the corresponding memory 
locations for the address or operand 
part of the instruction. 

The Label column will carry an entry 
for two conditions only: 


@ if the line contains the start of a 
subroutine 

@ if the line is the destination of a con- 
ditional or unconditional jump or 
branch instruction 


While assembly programs sometimes 

put certain limitations on the choice of 
labels, any suitable word or letter and 
number combination can be used as a 
label for hand assembly. However, it 
makes sense to pick a word or abbrevia- 
tion that indicates what the subroutine 
or branch destination is doing in the pro- 
gram (ie: “WAITLOOP,” “COUNT,” or 
simply “LOOP 7”). 
' The next column to fill in is the one 
with the heading Operand. When writing 
| Programs for machine assembly, the 
programmer enters a symbolic label in 
this field and leaves it up to the 
assembly program to figure out what to 
do with it. When writing for hand 
assembly, the programmer can make the 
task easier by being a bit more specific. 
| The operand can be one of the following 
| things: 


1. Inthe immediate addressing mode, 
it is simply the number that is to be 
_ entered by the operation. Rather than 
| give this number a symbolic name which 
"is defined somewhere in a symbol table, 
_ it is much easier to enter it directly in 
the Operand column. One has to be 
| Careful to remember which number 
| system is being used. A number without 
_ a prefix indicates decimal notation. The 

prefix % indicates binary notation. A bit 
mask for bit 2 and 0, for example, would 
have the operand % 0000 0101. If the 


. 


number is in hexadecimal form, the 
prefix $ would normally be used, but in 
this case it is much simpler to enter the 
hexadecimal number directly in the OPC 
column of the following line. 

2. With a jump or branch instruction, 
the operand symbol indicates the 
destination of the operation. The 
operand of such an operation must have 
a counterpart in the label column 
somewhere in the program. The only ex- 
ception is when the program calls 
subroutines that are stored in read-only 
memory, as | do frequently with 
subroutines of the KIM monitoring 
system. In this case, the operand symbol 
has to have a counterpart in the stored 
program. 

3. With any other memory referenced 
instruction, the operand must symbolize 
a memory location. | have found it 
useful to think of these locations as 
registers even though, unlike the 
registers of the processor, they are 
physically located somewhere in 
memory. As a matter of fact, their loca- 
tion, if possible, is in page O of the 
memory to take advantage of the 
shorter addressing mode. For registers 
used in stock subroutines, | have as- 
signed locations which begin at the 
upper end of page 0 and work their way 
downward. They are listed in a master 
register list and care has been taken that 
subroutines that are likely to be used in 
the same program do not occupy the 
same register addresses. The symbolic 
names for registers that will be used in 
the main program are noted in a pro- 
gram register table with the addresses to 
be assigned later. (See table 1.) The sym- 
bols again should be words or abbrevia- 
tions which indicate the meaning of the 
data contained in the register, such as 
STARLO to mean starting address, low- 
order byte. 


The column N of the program 
assembly form can be used to indicate 
the number of cycles it takes to execute 
the instruction. This is necessary, for ex- 
ample, to determine the time of timing 
loops. In most cases, however, this col- 
umn will be left empty. 

Finally, the Comment column should 
be used to explain the function of the 
operation listed in the current line and 
sometimes some following lines. While 
this information may not be needed by 
the programmer, it is a tremendous help 
for any other person trying to under- 
stand what the program is doing. If the 
program has been flowcharted first, 
which is highly recommended for all but 


205 


Listing 2: Program listing of BRAVEC using the author’s hand-assembly 
form for the KIM-1. This form can be used with any hexadecimal based 
microprocessor. 


Program: GRAVEC 
Page Lof 2 Date: Programmer: 


one] ADD[OPC[Lobel_[ WE [Wode 

foeco| | | 
ORME PRG 
BG 


io} 
=] 
o 
x 
Q 
—] 
= 


LTT 


wREEDGLE 


oOo; |] |e 
RI R] Ig 

S| |S 
Pasir fae 


alm : 
N tS. x 


N 
NN 
SRBASBSRoMaE = 


= > 
RE 


VA-BECC Program Assembly Form 


206 


the shortest programs, the comment can 
simply be a number which refers to an 
equally numbered symbol on the 
flowchart. 

In this way, the programmer works 
down the lines of the program-assembly 
form. Every time a 0 is encountered in 
the ADD column, he adds the most 
significant bit. If that addition makes the 
ADD column, it is also advanced. Even- 
tually the program will be completed 
and the hand assembly can begin. Like 
the computer, | do this in a number of 
passes. 

The first pass is the easiest one. Using 
a listing of the instruction set, or the pro- 
grammer reference chart, the mnemonic 
and the entry in the Mode column are 
used to look up the op code of the in- 
struction, which is entered into the OPC 
column of the line. A frequent error dur- 
ing this operation is to mistake an 8 for a 
B or vice versa, and | double check op 
codes with these symbols. The program- 
mer’s reference cards supplied by the 
manufacturers, although they fit nicely 
into a shirt pocket, were apparently not 
intended for use by progammers over 40 
years of age. The listing of the instruc- 
tion set in the data sheets or system 
manuals is usually printed in a more 
reasonable letter size. 

The second step is to assign absolute 
addresses to the symbols of the program 
register list. First, all registers and their 
addresses used in stock subroutines to 
be called by the program are transferred 
from the master register list to the pro- 
gram register list. Then absolute ad- 
dresses are assigned to all other registers 
listed, making sure that no duplication 
occurs. Registers which contain the low- 
order and high-order bytes of numbers, 
or registers which contain successive 
bytes if multiple precision operations 
are used, have to be arranged in such a 
way that their absolute addresses are ad- 
jacent in increasing order (STARLO = B3, 
STARHI = B4). 

With the completed program register 
list one can go over the program again. 
For each memory referenced instruction 
other than branch and jump instruc- 
tions, the program register list will con- 
tain an absolute address for the symbol 
in the operand column. This hex- 
adecimal number is now entered into 
the OPC column of the following line. 
For registers located outside of page 0 
(such as the registers in PIAs) the address 
will be entered in two lines, and care has 
to be taken to enter the low-order byte 
first, followed by the high-order byte. 
During this pass | also check all lines 


with a # in the Mode column and, if nec- 
“essary, convert the binary or decimal op- 
‘erand into hexadecimal notation which 
_ is entered in the OPC column of the fol- 
owing line. With this step completed, 
‘the OPC column should show a hexa- 
‘decimal number in most lines. The next 
“step is to pass over the program again. 

Any line with an open OPC column 
‘where the menemonic indicates a 
branch instruction will require that the 
‘branch vector for the relative addressing 


"branches this poses no problem because 
‘the offset can easily be counted off by 
eginning at the second line following 
‘the one which contains the branch in- 
struction, and continuing to the line 
which has the corresponding symbol in 
‘the label column. For longer branches 
‘and especially backwards branches, if 
jemory pages are crossed it is very easy 
make a mistake and miss by one 
‘count in either direction. | have found it 
advantageous to let the microcomputer 
erform this operation because, after 
all, it is much better in hexadecimal 
‘calculations than any programmer. 
_ The example program BRAVEC re- 
ceives the origin and destination of a 
branch and calculates the branch vector 
in two’s complement notation. A flag is 
set if the relative addressing range is ex- 
ceeded. The program is loaded from 
‘cassette tape beginning at memory loca- 
"tion 0000. Loading begins here because 
this location in the KIM-1 system can be 
addressed easily by pressing the space 
bar of the connected terminal. The first 
four locations are actually data registers 
into which the low- and high-order bytes 
‘of origin and destination of the branch 
are entered. 
__ When the program is executed begin- 
“ning at location 0004, it displays or 
prints the branch vector in two’s com- 
plement as the low-order byte of the ad- 
“dress field. The high-order byte of this 
field normally shows 00, while FF in- 
‘dicates that the reach of the relative ad- 
‘dressing mode has been exceeded. 
__ While the program, as listed, is written 
for the 6502 microprocessor, only in- 
‘structions that have an equivalent in the 
‘instruction set for the 6800 were used. 
The program, therefore, can be con- 
Verted easily. However, the registers 
POINTHI and POINTLO, which are 
displayed as an address in the light- 
emitting diode (LED) display of the 
KIM-1 microcomputer, are specific for 
this system. For other computers, the 
user will have to find another way of 
displaying the result of the calculation. 


Listing 2 continued: 


Program: SRAVEC 
Page Zof 2 Date: 


Programmer: 


F 
VA-BECC Program Assembly Form 


207 


208 


After all branch vectors have been 
calculated in this fashion and entered in 
the appropriate lines, the only open 
spaces in OPC column should be the ad- 
dress parts of jump instructions. For 
jumps within the main program it is easy 
to find the line with a matching entry in 
the label column and to enter the ad- 
dress of this line into the OPC columns 
of the lines following the one containing 
the jump instruction. For subroutines 
called from read-only memory, the ad- 
dress has to be looked up in the 
subroutine listing. 

Stock subroutines which have been 
written on some other occasion and 
which can be loaded from magnetic or 
paper tape frequently can be used. Nor- 
mally such subroutines will be tacked 
on after the last memory location oc- 
cupied by the main program. The KIM-1 
system has a relocating loading routine 
for loading from magnetic tape. If this 
feature is not available, some area in the 
memory should be set aside into which 
the subroutines are loaded. A move pro- 
gram then can be executed to pull up 
the subroutine. For the 6502 processor, | 
use a program called MOVBLO which 
requires only fourteen program steps 
due to one very convenient addressing 
mode of this processor. 

Unless one is very pressed for memory 
space, it is a good idea to have all 
subroutines start in lines with a 0 as the 


least significant digit because it is easier 
to keep track of the starting address 
after relocation. In order to be 
relocatable, a subroutine may not con- 
tain any absolute jump instructions and 
only relative addressing within the 
subroutine is permitted. 

After the last addresses for the stock 
subroutines have been entered in the 
program assembly form, the hand 
assembly is completed. | have never 
clocked the operation, but by following 
the methods described, it goes much 
faster than one would expect . With all 
op codes being listed in a single column 
it is much easier to enter them into the 
machine, either from a hexadecimal 
keyboard or from the keyboard of a ter- 
minal. This is another occasion in which 
operator errors can easily occur and | 
proofread all programs after entry. This 
operation is again greatly simplified by 
the use of the assembly form which 
shows address and op code in adjacent 
columns. 

The assembly method and the 
assembly aids described have been in 
use for several months and have been 
found to greatly reduce the likelihood of 
assembly errors. Unfortunately, this 
method does not protect from program- 
ming errors, and the debugging of the 
program still is a time consuming but 
necessary step to follow the assembly of 
a program. @ 


An Introduction 
to Polish Postfix 


Notation 


D Wilson Cooke 


We are living in an age of machines, 
and we must often adjust our ways to 
the ways of machines if we are to get 
along with the big black boxes. Com- 
munication of mathematical expressions 
to a computer or pocket calculator must 
be done in a form that the machine 
understands. This article discusses one 
aspect of this man-machine communica- 
tion: Polish notation. 

In high school algebra we learned to 
evaluate mathematical expressions such 
as a? + b(c — a). This sort of algebraic 
notation, called infix notation, has been 
in wide use for many years. It has served 
us well, and it is the only notation that 
most people know. Many of the better 
scientific pocket calculators have keys 
for entering parentheses so that the user 
can calculate complex mathematical ex- 
pressions using this popular standard 
notation. Many high-level computer 
languages such as BASIC, FORTRAN, 
ALGOL, PL/I, etc, permit the program- 
mer to write rather complex math ex- 
pressions involving several operations 
and several variables using infix nota- 
tion. 

There is a form of notation called 
Polish notation or reverse Polish nota- 
tion (RPN) which is generally more effi- 
cient for computer processing. The 
name “Polish” is in honor of the Polish 
mathematician, Jan Lukasiewicz, who 
introduced this alternate form of nota- 


tion in the late 1920s. Some pocket 
calculators, like the Hewlett-Packard 
scientific calculators, use reverse Polish, 
or Polish postfix, notation. | will define 
this term later. 


Characteristics of Standard 
Algebraic Notation 


Standard algebra expressions have 
two characteristics which distinguish 
them from Polish expressions: 

1. In infix notation, the operation 
symbol, or operator, is written between 
the symbols representing the variables, 
or operands. Thus, a + b means the sum 
of b added to a; a — b means the dif- 
ference of b subtracted from a; a t b 
means the result of raising a to the b 
power, and so on. We have all learned 
the shorthand way of writing products, 
which looks like “ab,” and raising to a 
power, which looks like “ab,” but this 
notation is not generally used in com- 
puter languages. The operator must be 
included with the operands. 

2. Infix notation expressions invol- 
ving mixed operations are always 
calculated in a definite order of 
precedence. For example: 


@ Expressions in parentheses are 
evaluated first and treated as a 
single term. 

@ Raising to a power (1) is always the 


209 


Infix Notation Polish Notation Evaluation 
AB+C— 

AB*CD* + 

AB2IC* + 


A+B-C 
A*B+C*D 
A+BI2*C 


A*B/C AB*C/ 


AB*ABD*CE! +* + 
AB*C —DE/+ 
ABC — *DE/+ 


A*B+A*(B*D+CIE) 
A*B-C+D/E 
A*(B-C)+D/E 


2.83333 
—1.16667 


Table 1: Comparison of infix and Polish notation for several expres- 
sions. In this case A=2, B=3, C=4, D=5 and E=6. 


READ ONE TERM 


is TERI 
? 


OUTPUT TO 
POLISH STRING 


OUTPUT STACK TO 
POLISH STRING 

UNTIL "(". 
DISCARD ")" AND “(* 


TeReCEDENCE OF 
—--{LasT READ OPERATORI 
|<ToP OPERATOR | 


Figure 1: Flowchart for the step-by-step conversion of an infix notation 
expression to a Polish notation string. 


210 


first operator to be performed. 

@ Multiplications and divisions (*,/) 
are performed next. 

@ Additions and subtractions (+, —) 
are performed last. 

@ Within a sequence of multiplica- 
tions or divisions containing no 
parentheses, evaluation is from 
left to right. 

@ Within a sequence of additions or 
subtractions containing no paren- 
theses, evaluation is from left to 
right. 


Using these characteristics of infix 
notation, we can construct a correct and 
unambiguous evaluation of every ex- 
pression, no matter how complex. Con- 
sider the expression: 


x=(ab—c(b—a)*)/(d+b) 
This would be coded in BASIC as: 
LET X = (A*B—C*(B—A)13)(D + B). 


The BASIC statement 'ts an instruction 
to evaluate the expression to the right of 
the equal sign according to the rules 
stated earlier, and to then store the 
result as X. The evaluation proceeds as 
follows: 


@ Because of the parentheses, A is 
subtracted from B, and D is added 
to B. Each result is stored tem- 
porarily. 

@ The result of B — A is then raised 
to the third power, (ie: it is multi- 
plied by itself three times). 

@ The result from step 2 is multiplied 
by C; also, A and B are multiplied 
together. 

@ The second of the two results from 
step 3 is subtracted from the first 
because of the parentheses. 

@ This result is divided by the second 
result of step 1. 


The evaluation is complete, and the 
final result is written as X. It is important 
that both the mathematics student and 
the computer programmer understand 
this procedure if correct results are to be 
obtained. (1 have seen many incorrect 
answers on test papers because students 
failed to apply the rules of procedures 
correctly.) If we let A = 2, B =3,C = 4 
and D = 5, the final result for X should 
be 0.25. 


Polish Notation Characteristics 


Infix notation is important because it 


is universally used from elementary 
school onward. It is a “people-oriented” 
way of writing mathematical expres- 
sions. Readers with only a casual in- 
terest in computation on a computer 
with a high-level language, or the major- 
ity of pocket calculators available 
today, will find this notation entirely 
adequate. If, however, one is curious to 
learn more about the inner workings of 
computer logic, a study of Polish nota- 
tion is in order. Hewlett-Packard points 
out that Polish notation is the most effi- 
cient machine processing notation avail- 
able today. Most computer-language 
compilers will convert infix expressions 
to Polish expressions prior to the conver- 
sion to machine language for evalua- 
tion. 

There are several different forms of 
Polish notation in use, but all are based 
on the same principles. This article will 
treat only one version: Polish postfix 
(RPN) notation. Polish notation differs 
from infix in the two characteristics 
outlined earlier, namely the order of 
operators and operands, and the hierar- 
chy of precedence. 

Operators are written after operands. 
Thus AB+ means the sum of B added to 
A; AB— means the difference of B sub- 
tracted from A; AB* means the product 
of A multiplied by B; AB/ means the quo- 


Character 
Number 


AB 
AB* 


AB*A 
AB*A 


ABTA 
AB*AB 
AB*AB 


AB* ABD 
AB*ABD* 


1 
2 
3 
4 
5 
6 
7 
8 
9 
te) 
1 


AB*ABD*C 
AB*ABD*C 
AB*ABD*CE 
AB*ABD*CEt + 
AB*ABD*CEt + *+ 


tient of A divided by B; and ABt means 
A raised to the B power. 

In expressions involving mixed opera- 
tions, evaluation proceeds from left to 
right. No parentheses are used and there 
is no precedence of operations. 

Since this notation is unfamiliar to 
many people, several examples of 
expressions in both infix and Polish nota- 
tion along with numeric results are 
presented in table 1. The mechanism for 
conversion from infix to Polish form is a 
formal procedure which requires the use 
of temporary storage of certain opera- 
tion symbols, always keeping in mind 
the order of precedence in infix nota- 
tion. As each character is read from left 
to right, a decision is made either to 
write it as output on the Polish string or 
to store it for later use. The storage 
medium is a stack or push down-list. It is 
sometimes called a last-in first-out 
(LIFO) memory. The stack is a sequence 
of memory locations in which new items 
can be added only from one end pushing 
all other items down one location. Items 
are removed from the same end raising 
all other items in the stack. 

A physical illustration of a push-down 
stack is seen in many restaurants where 
dishes are put in a well with a spring- 
loaded bottom. As a new dish is added, 
all others shift down one position. As a 


Comments 


operand; 

operation symbol is put on top of stack; 

operand; 

+ operation has less precedence than * on top of stack. * is output to Polish string 
and + is put on top of stack; 

operand; 

* operation has more precedence than + on top of stack * is put on top pushing + 
down. 

put parenthesis on stack; 

operand; 

* operation has more precedence than (on top of stack. * is put on top pushing ( 
down; 

operand; 

+ operation has less precedence than * on top of stack. * is output to Polish string 
and test is repeated. + is greater than (so + is output to Polish string; 
operand; 

1 operation has more precedence than + on top of stack. 1 is put on top of stack; 
operand; 

), all operators on stack are output up to (, ( is discarded; 

no more characters. all operators are output to the Polish string and evaluation is 
complete; 


Table 2: Step-by-step evaluation of the expression A*B +A *(B*D+CtE). The stack contents at any stage are listed from 
left to right with the top of the stack at the left. 


211 


LOAD A 
MULT B 
STORE TEMP 1 
LOAD A 
STORE TEMP 2 
LOAD B 
MULT D 
STORE TEMP 3 
LOAD C 
EXPN E 


ADD TEMP 3 
MULT TEMP 2 
ADD TEMP 1 


dish is removed, all other dishes move 
up one location. The last dish put on the 
stack is the first to be removed. 

Conversion of infix to Polish notation 
proceeds as follows: 


@ Read the infix expression from left 
to right one character at a time. 
@Put operands (ie: variable or 
numbers) directly into the Polish 

string. 

@ Push all left parentheses tempor- 
arily onto the stack. 

@ Check all operators read for order 
of precedence in comparison with 
the operator on the top of the 
stack. 


The order of precedence is as follows: 


@ 1 is greater than * or /, which are 
greater than + or —, which are 
greater than (. 

a. If the operator last read is 
greater in precedence than the 
operator on the top of the stack, 
it is pushed onto the stack. 

b. If the operator last read is 
not greater in precedence than 
the operator on the top of the 
stack, the stack top is placed in 
the Polish string, and the test is 
repeated until this is no longer 
true. The operator last read is 
then put on the stack. 


; A is loaded into the accumulator 


; contents of accumulator multiplied by B 


@ When a right parenthesis is en- 
countered, all operators on the 
stack are sent to the string, starting 
with the top, until a left paren- 
thesis is encountered. This is dis- 
carded. 

@ When a terminating character indi- 
cates that the end of the infix 
expression has been read, the en- 
tire contents of the stack is output 
to the Polish string, starting with 
the top. 


Figure 1 presents these rules in flow- 
chart form, and table 2 shows a step by 
step conversion of the expression: 


A*B+A*(B*D+CtIE) 


into Polish notation. Following each step 
in table 2 along with the rules above (or 
the flowchart) should help the reader to 
understand the rules. 

When an expression involves some 
function such as sine, square root, loga- 
rithm or arc tangent, a subroutine is 
usually called to execute the function 
after the Polish string is evaluated. One 
may write the name of the function at 
the end of the Polish string to indicate 
this. 

Once a Polish string is obtained, it is 
reasonably easy to read the string from 
left to right and generate assembler 
code for computer processing, or to 


; contents of accumulator stored temporarily 


; Ais loaded into the accumulator 


; contents of accumulator stored temporarily 


; Bis loaded into the accumulator 


; contents of accumulator multiplied by D 


; contents of accumulator stored temporarily 


; C is loaded into the accumulator 


} contents of accumulator raised to E power. This most likely will be a subroutine call of 


some sort 


; contents of TEMP 3 added to accumulator 
; contents of accumulator multiplied by TEMP 2 
; contents of TEMP 1 added to the accumulator; The accumulator now contains the final 


result 


Table 3: Hypothetical assembly code listing for the Polish notation string AB*ABD*CEt + * +. The terms were chosen 
for clarity rather than for any particular assembler. The storage locations TEMP1, TEMP2, and TEMP3 could be located 
in programmable memory; ideally, they would be part of a push-down stack. Note that the locations are used in the 
reverse order from the way they were created. Polish notation pocket calculators have a push-down stack available. 
Loading is usually done with an ENTER key. 


212 


read and enter numbers in a pocket 
calculator following the instructions 
supplied by the manufacturer. Table 3 
presents a hypothetical assembler 
translation of the Polish string AB*ABD* 
CEt+*+ obtained in table 2. It is 
‘impossible to evaluate this expression as 
written on the HP—45 because of 
limited stack capacity, but a slight 
modification of the expression to eli- 
minate the need for three temporary 
storage locations in the stack will solve 
this problem. Had the original infix 
expression been written as (B*D + 
CtE)*A + A*B prior to translation to the 
Polish string BD*CEt+A*AB*+, the 
need for three temporary stack storage 
locations would have been eliminated, 
and the HP—45 could have handled the 
expression. The lesson is that if you are 
using a computer or calculator with 
limited stack capacity, write your 
algebra to keep a minimum of storage 
locations. 

A word should be said about the 
unary minus sign. The expression —A is 
interpreted to mean the additive inverse 
or opposite of A and the “—” is not a 
subtraction sign. A new symbol could be 
invented to distinguish this from sub- 
traction, but this is not necessary. We 
can interpret —A as meaning either 
—1*A, where —1 is the number 
negative one, or we can interpret —A as 
meaning to add the additive inverse of 
A. An application of a little high school 


algebra theory will usually clear up the 
problem quickly. 

It is generally not necessary for the 
average computer user to become an ex- 
pert at quick translation of complex 
infix expressions into Polish expressions. 
After all, computers were invented to do 
routine work of this sort and free the 
mind for creative thinking. Anyone who 
already has a compiler for his computer 
has the ability to make the necessary 
translation from infix notation to 
machine language. 

Compiler design differs from machine 
to machine, but the design philosophy is 
similar. If your computer is without a 
compiler, it might be interesting to write 
a program that will perform the 
necessary translations. If a programmer 
knows how his compiler works, he can 
use this knowledge to write his expres- 
sions so that they will compile in the 
most efficient machine-language code. 

It is beyond the scope of this article to 
go into the area of compiler design, but | 
hope it has helped the reader to under- 
stand mathematics notation and the use 
of Polish notation. We will not soon 
abandon our standard algebra notation 
in favor of something else. This genera- 
tion and the next will undoubtedly be 
raised on infix notation. But who knows? 
Your grandchild may come home from 
first grade some. day and ask you if 
22 + is really 4.— 


213 


stor Monge da 
a} eh . Gy 4 hit 
ihe ‘Sosy 


me ey 


‘iit yas 

| ae rie pee TAS Pie hekle oF 
«NR coe Dir Nb RT UE Manne ier 
a yr rangi 

ed ly 


Degettyt feel wai aati oe 
mW Vitale nip Whee? te aly yews 
— a 118 we te agchert las sha 


t 7 ray ¥ 
4 ' i 
* 
“+ invews ar dale . 
- 
tind a Ah ; Pe ' 
2% a fan . Me aimee 
hia) ii Filwk ve ' 4 J <p “ 
1 MS A eer org) Tee iam prt) ’ 
ANE pues Atte Views ants bake 2 5 
" r toys » 
a wel 
veep t he 
shorted * , 
ra » ie eee Pe sa? of 
Le i 
Caray 1 TRUE ae hee 
iret - far Fie ; 
y ho apr fu» Ne io) ee ae) ee 
a, diel § spall oe " > — a - te 
fad sya! ey “oe ro ray AG TAR t* 5 ’ be ror \ eae 
i Y Sha ore rye Seater Bd . fore . Ay zin ' 2 
ee sapere ped 6) a om ony greed, Mong iy | o tamed 


Ta eran a ty ari oo) > iad pts Ray el ty ee 


‘Microprocessor Memory 


Testing 


Larry Lee 


Semiconductor memory certainly 
dominates the memory market when it 
comes to microprocessors. It is cheap, 
readily available in a wide variety of 
sizes, speeds, packages, etc, and it is 
easy to use. It also possesses good relia- 
bility characteristics once the initial 
“burn in” failures are weeded out. Find- 
ing these failures is absolutely essential 
if your system is to operate reliably, or 
even operate at all. If the memory is not 
100 percent functional, then the system 
cannot be expected to perform as 
planned. 

Memory failures can be very difficult 
to find. How do you find a memory chip 
that is pattern sensitive, or one whose 
access time is a little slow sometimes? If 
you have a high performance integrated 
circuit tester available you can screen 
out the obviously defective parts rapid- 
ly; but, however good integrated circuit 
testers may be, they are not perfect. 
Some chips that work fine on the tester 
may not work in your system. There 
could be several reasons for this: your 
system may have more noise than the 
tester; the input voltages may be dif- 
ferent; the timing could be significantly 
different. The only test that really 
counts is whether or not the chips work 
correctly in your system. 

If you use large quantities of program- 
mable memory, then it would probably 
be worth your while to invest in a good 
integrated-circuit tester. But if you only 
use small quantities of memory chips, it 
would definitely not pay to invest 
$50,000 or more in a high-performance 


integrated-circuit tester. In this case 
there are three alternatives: 


1) Pay a little more and have the 
manufacturer test the chips... 

2) Find an independent company with 
the proper tester and pay them to 
test the parts. 

2) Develop a few short programs for 
your microprocessor and test 
them yourself. 


The way | have found to be most ef- 
fective in testing semiconductor 
memory is to plug the chips into the 
memory board and let the processor do 
the testing. This way the memory is 
tested in its actual operating environ- 
ment. Note that this works only if the 
processor, address bus, data bus and 
control bus are known to be 100% func- 
tional. If any of the test-system com- 
ponents are faulty, then the concept of a 
self-testing system will not work. This 
kind of testing also requires some 
memory that is assumed to be good in 
order to hold the test program. 

What do you test? Basically, you test 
the memory chips to see if they will 
accept and retain data that is given to 
them by the processor. How do you do 
it? The obvious answer is to write 
something into memory, and then read it 
to make sure that it was written properly 
and is readable. Using the concept of 
system self-test, the easiest way to 
check all the locations in a memory is to 
write a program to lay down a known 
pattern through memory, and then read 


215 


LY ane as aus | 


[POINTER EQUAL F-- 
{STOP POINTER | 


a | 


SET START 
AND STOP 
POINTERS 


CLEAR 

ACCUMULATOR 

AND ADD UPPER 

AND LOWER 

Cie a eee 

| COMPLEMENT | 

1 ACCUMULATOR 

(VARIATION ONLY), 

Beco | Ug a eee 
r 7 
' compare! 
| accumuLaTor t-~~ 
1 TO MEMORY | 
Goewee 
| STURM) To ge 7 
'poes start | 


| STOP POINTER | 


INCREMENT 
START 
POINTER 


{POINTER EQUAL [--- 


SET START 
POINTER 


r--b--A 
T 
| COMPLEMENT 4 
| ACCUMULATOR | 
K(VARIATION ONLY) 
Bk A teed 
ERROR ERROR 
ROUTINE 
NO ERROR 
<a 
INO 
INCREMENT bali 
START 


POINTER 


Figure 1: Flowchart for test 1 and its variation. 


216 


the pattern back. It would then be little 
problem to output a list of all incorrect 
locations on your display device. But 
what pattern do you write into memory? 
This is an important decision since some 
patterns are better than others at finding 
errors. The easiest pattern to run might 
be to write all 1s, read that, then write 
all zeros and read that. This type of test 
would find three kinds of errors: gross 
chip failure; a bit or bits stuck at logical 
1; a bit or bits stuck at logical 0. This test 
also might find gross access time 
failures. But it probably would not 
discover chips that were a little slow due 
to the nature of the data stored in them. 
In addition, this test would not find adja- 
cent bit failures, another common type 
of error. This failure occurs when two 
bits appear to be “stuck” together. This 
type of failure can be due to metaliza- 
tion shorts on the chip, leakage from 
one memory cell to another through the 
substrate, or moving an address line 
while the read/write input is in the write 
mode. One way of detecting this type of 
failure is to fill the chip with all zeros, 


then change one location at a time to a 
one. After one bit is changed the entire 
memory is read to ensure that only one 
location has changed. 

Another type of memory chip failure 
is the thermal related failure. Usually 
chips fail when they heat up excessively 
(although it is possible that they might 
not work right until they warm up a lit- 
tle). There are basically two kinds of 
thermal related failure: access time 
failure and loss of data. Access time 
failures stem from the fact that as the 
chip temperature rises the access time 
increases. Loss of data failures occur 
because internal chip thresholds and 
voltage levels shift with changes in 
temperature, resulting in a memory cell 
or cells becoming unable to retain data. 
Both of these failures can be extremely 
difficult to find. It could take hours for 
the problem to show up, as soon as you 
power down to check or change some- 
thing, the chip could cool down enough 
to make the problem disappear for 
another half hour or so. One solution to 
this type of failure might be to put a 


FILL MEMORY 
WITH ZEROS 


SET TEST 
POINTER TO 
START OF 
MEMORY 


STORE 'FF' 
IN MEMORY 

POINTED TO BY 
TEST POINTER 


SET READ 
POINTER TO 
START OF 

MEMORY 


AAT 
lD0es READ 4 
[POINTER EQUAL = 
| TEST POINTER , 


Loe 


IPOINTER > I-— 
| STOP POINTER 


Se RE, ad 


ERROR 
ROUTINE 


INCREMENT 
READ 
POINTER 


POINTED TO BY 
TEST POINTER 


' 
' 
peont---5 


' 

1 1S READ ! 

| POINTER > | INCREMENT 
1 STOP POINTER! READ 

L POINTER 


ERROR 
ROUTINE 


INCREMENT 
TEST 
POINTER 


Figure 2: Flowchart for test 2. 


4 POINTER > 
1 sToP PoNTER! 


Loceee 


yer es oad 


small fan somewhere in your system 
where it can blow cool air over your 
memory. This should circumvent the 
problem and definitely would prolong 
chip life. 

In the program | run there are two 
basic tests, with a variation of each. The 
first test, as flowcharted in figure 1, runs 
in about ten seconds for 8 K bytes on an 
8008. It finds gross chip failures, bits 


stuck at 1 or 0, most adjacent bit failures 
and some access time failures. The 
second test, which is flowcharted in 
figure 2, runs in about six hours for 8 K 
bytes and finds adjacent bit and thermal 
related failures. 

The first test scans through memory 
twice. The first pass is used to generate 
and write a pattern through memory. 
The second pass is used to verify that 


217 


212 


the pattern remains correct in memory. 
The algorithm for this test is as follows: 
there are two 16-bit pointers maintained 
in the processor's registers. One pointer 
is set to the starting location of the 
memory to be tested. The other is set to 
the last location to be tested. The upper 
and lower eight bits of the address in the 
start pointer are added together and the 
result is stored in the memory location 
pointed to by the start pointer. The start 
and stop pointers are compared. If they 
are equal, then something has been writ- 
ten into every location and it is time to 
branch to the read routine. If the two 
pointers are not equal, then increment 
the start pointer and branch back to the 
add step. 

The read routine consists of setting 
the start pointer back to the start of 
memory to be tested, adding the upper 
and lower eight bits of the address in the 
start pointer together, and comparing 
the result with the contents of memory 
pointed to by the start pointer. If the 
two are not the same, then there is an 
error and the program should branch to 
an error routine. If they are the same, 
then compare the start pointer to the 
end pointer. If these pointers are equal, 
then the test is complete and control 
should be transferred to the system 
monitor or to the next test. If not equal, 
then increment the start pointer and 
branch back to the add step. 

A useful variation of this test if as 
follows: after adding the upper and 
lower eight bits of the start pointer, store 
the complement of the result in memory 
and continue as before. The read routine 
would be identical to the first read 
routine, except that the program would 
check for the complement cf the sum of 
the upper and lower eight bits of the 
start pointer. Together these test pro- 
grams check every bit in memory for a 
one and a zero. 

It is possible for errors to slip through 
the test just presented, but it is very 
unlikely that any defective memory 
chips would get through the next test 
undetected. The first part of this test 
(refer to figure 2) starts out by filling 


memory with all Os and putting a hex- 
adecimal FF in the first location to be 
tested. The program then verifies that 
the FF is in memory correctly and that 
all other locations still contain Os. After 
this is done, the program writes hex- 
adecimal 00 in the previous address 
location and hexadecimal FF in the next 
location. It then checks the previous 
location for 00, the next for a hexa- 
decimal FF and all other address loca- 
tions for a hexadecimal 00. The program 
then continues this procedure as it 
marches a hexadecimal FF through 
memory, stopping when the test pointer 
is greater than the stop pointer. Note 
that at any time there is only one loca- 
tion in memory that contains a hex- 
adecimal FF. The variation of this test is 
to fill the memory with ones and march 
hexadecimal 0s through in the same way 
as just described. 

This test and its variation can take 
hours to run, depending on processor 
speed and memory size, and is intended 
to be run either whenever you add new 
memory to a system or if you suspect a 
memory problem. Because of its speed, 
the test flowcharted in figure 1 can be 
run at almost any time. Some appro- 
priate times to run this test might be 
whenever you power up the system or 
just before you run an important pro- 
gram. 

Whenever any of the tests detects an 
error it should immediately call the error 
routine. The purpose of the error routine 
is to give the operator as much informa- 
tion as possible about the error, such as 
the error data, the data that should have 
been there and the address of the error. 
After the error message is complete, the 
processor should continue testing from 
the next location. The error routine 
should not affect the test routine. An 
error counter might be included to stop 
the test if too many errors are detected. 

By spending a little time writing a few 
fairly simple test programs, you can save 
a lot of time and trouble debugging your 
system. Not only will you find memory 
problems faster, but you will have more 
confidence that your system will run as 
planned. 


An Algorithm for Drawing Lines 


Louis J Cesa 
Eduardo Kellerman 
Robert B Hitchcock Sr 


Copyright by International Business Machines Corporation, 1978. 


This article describes an algorithm 
that generates, in sequence, the points 
that best approximate a straight line be- 
tween two endpoints in an integer coor- 
dinate system. The algorithm makes no 
use of multiplication or division. Thus, it 
is particularly suitable for implementa- 
tion in microprocessors which do not 
support these operations. In addition, no 
calculation generated by the algorithm 
exceeds 1.5 times the larger of the 
horizontal or vertical distance between 
endpoints, a value that is exceeded by 
all other algorithms known to the 
authors. This means that this algorithm 
can be used without overflow on as 
large as a 170 by 170 grid using 8-bit 
arithmetic only. 

Given the coordinates (X1,Y1) and 
(X2,Y2) for the two endpoints of the line 
to be drawn, the algorithm computes the 
points that best approximate the straight 
line. It generates the points starting with 
(X1,Y1) and ending with (X2,Y2) as 
follows: 


STEP 1: SetR = 0 
STEP 2: Output X, Y (next point). 


STEP 3: If DX<DY then go to STEP 10. 
STEP 4: If X = X2 then stop. 


Printed by permission. 


STEP 5: Change X : If X2>X1 then increment X by 1. 
If X2<X1 then decrement X by 1. 
STEP 6: SetR = R + DY. 
STEP 7: Decide if Y is to be changed: 
If R<DX — R then go to STEP 2. 
STEP 8: Change Y: If Y2>Y1 then increment Y by 1. 
If Y2<Y1 then decrement Y by 1. 
STEP 9: Set R = R — DX then go to STEP 2. 
STEP 10: If Y = Y2 then stop. 
STEP 11: Change Y: If Y2>Y1 then increment Y by 1. 
If Y2<Y1 then decrement Y by 1. 
STEP 12: Set R = R + DX. 
STEP 13: Decide if X is to be changed: 
If R<DY — R then go to STEP 2. 
STEP 14: Change X: If X2>X1 then increment X by 1. 
If X2<X1 then decrement X by 1. 
STEP 15: Set R = R — DY, then go to STEP 2. 


As an example, let’s set point (X1,Y1) = 
(0,0) and point (X2,Y2) = (9,3). The result 
of executing the algorithm is as follows: 


POINTS PLOT 


0,0 

1,0 Yt3 XX 

2,1 2 XXX 

3,1 1 XXX 

41 OXXx 

5,2 0123456789 

6,2 

a2 Xx—- 

8,3 

9,3 a 


219 


The BYTE Books Library 


Bar Code Loader “ ..... Ken Budnik 
BASEX: A Simple Language and Compiler for 8080 Systems . ae ‘ .... Paul Warme 
BASIC Scientific Subroutines, Volume 1 . . . or ... Fred Ruckdeschel 
Beginner's Guide for the UCSD Pascal System . . ....... Kenneth L. Bowles 
Beyond Games: Systems Software for Your 6502 Personal Commuanr ; .. Ken Skier 
The Brains of Men and Machines......... ; waded Banco : Ernest Kent 
Build Your Own Z80 Computer : : Cree, sit : Steve Ciarcia 
The BYTE Book of Computer Music eee ” Christopher Morgan (ed.) 
The BYTE Book of Pascal . . ; é k Blaise Liffick (ed.) 
Ciarcia’s Circuit Cellar .. a8 : : d Steve Ciarcia 
Ciarcia’s Circuit Cellar, Volume II ver Steve Ciarcia 
Digital Harmony: On the Complementarity of Music and Nicteal Art : ... John Whitney 
Inversions Ese Scott Kim 
K2FDOS: A Floppy Disk Operating System for the 8080 Rr 2 Kenneth Welles 
LINK68: An M6800 Linking Loader Robert Granpe! and Jack Hemenway 
Magic Machine .. . Theodore Cohen and Jacqueline Bray 
Microcomputer Structures . Liveseeeassess.. Henry D’Angele 
MONDEB: An Advanced M6800 Monitor Debugger tex as cossaesase sees Dn PeGiiE 
Programming Techniques: Program Design +r , .. Blaise Liffick (ed) 
Programming Techniques: Simulation s Shea ‘ Blaise Liffick (ed.) 
Programming Techniques: Numbers in Theory and Practice Re .. Blaise Liffick (ed.) 
RA6800ML: An M6800 Relocatable Macro Assembler .. Jack Hemenway 
Superwumpus 2 F pf ae Jack Emmerichs 
Tiny Assembler 6800: Version 3. 1 ...... Jack Emmerichs 
Threaded Interpretive Languages ; R.G. Loeliger 
TRACER: A 6800 Debugging Program .. . Robert Grappel and Jack Hemenway 
You Just Bought a Personal What? Thomas Dwyer and Margot Critchfield 


For a complete catalog of our publications, write: 


BYTE Books 
70 Main Street 
Peterborough, NH 03458 


; 


Programming Techniques is a 
collection of the best articles 
from BYTE magazine plus new 
material concerned with the 
art and science of computer 
programming. The basic prin- 
ciple of the series is to provide 
the personal computer user 
with sufficient background 
information to write and 
maintain programs effectively. 
The fourth book so far 
scheduled in this series is BITS 
AND PIECES. This book is a 
collection of miscellaneous, 
unrelated articles which con- 
cern many of the programming 
techniques essential to personal 
computing. Areas such as 
multi-programming and inter- 
active computing with the 
personal computer are dis- 
cussed, as well as stacks, sorting, 
Polish notation, and program 


optimization. 

Other books in the Programming Techniques series are: 
Program Design ISBN 0-931718-12-0 
Simulation ISBN 0-931718-13-9 


Numbers in Theory and Practice ISBN 0-931718-14-7 


ISBN 0-07-037828-2 


